Foreign language and non-ASCII submissions


#1

So, after participating in the conversation about underscores, I had to check something - it turns out the design guidelines don’t mention what (human) languages are accepted for submission, or what encoding is valid for identifiers.
Which means this:

class А
{
    public А()
    {
        Console.WriteLine("You successfully instantiated an А!");
    }
}

is perfectly valid!
Unfortunately, this is only going to confuse people attempting to use this code, since VisualStudio will, given most people’s habits, output the following rather unhelpful error message:


(for those of you who are still scratching your heads - the class was defined as the Cyrillic, or Russian alphabet, capital “A”. Which most of us can’t type on our keyboards, and don’t think about in the first place)

Yes, there is:

✗ DO NOT use underscores, hyphens, or any other non-alphanumeric characters.

but that’s not going to help if somebody inadvertantly uses instead of 1 (Japanese character set “1” (half width?) versus standard ASCII “1”. Yes, that’s valid in a file). I’m honestly not expecting reviewers to catch this kind of thing - this should be easy to check with an automated system, although care would be needed around test data.


There is a related, but somewhat distinct, problem with somebody using something other than English for (at least public) identifiers and **documentation**. I have a feeling non-English submissions wouldn't be accepted, simply because most people wouldn't be able to tell what the library was supposed to be doing (or not without reverse engineering the thing). Granted, most of the people involved in this speak (and the de-facto language of the internet and programming in general is) English, but I think it's something we ought to consider, and specify.

(There’s also the unspoken, probably unrealized, assumption that Microsoft will be footing the bill for the official translation of the documentation. You know, when it releases .NET to the rest of the world…)


#2

All identifiers in .Net are in English and I’m pretty sure anyone who would consider submitting code for .Net Core knows this. That means there is no reason to explicitly state it as a rule and that writing an automated system to check this would be a waste of time.


#3

You’d have a very difficult time automatically checking for foreign language (and code review should be able to catch this easily).

The real problem comes from non-ASCII characters, via either

  1. Somebody deliberately trying to mess with people (“just because Micro$oft”, say).
  2. Somebody with an alternative (non-English/US) keyboard accidently including something unusual.

Normal code review is unlikely to notice either of these cases (if even distinguishable). An automated check would be the best solution, probably added as an FxCop rule or something.

…speaking of which, why doesn’t that page post a canonical rule file?


#4

As svick said, we generally assume the language of the contributions to be in English. That’s also somewhat non-negotiable because contributions require that all contributors and project maintainers can communicate and English is the clearly the lingua franca here.

As far as the usage of non-ASCII characters go: I completely agree that the usage of those is somewhat dangerous. Fortunately, the usage is be easily detectable by static analysis (e.g. FxCop). I’ve asked the experts to see if we’re already doing it.

Our code formatter has rules that escape non-ASCII characters in literals with their Unicode escapes, though. This avoids the issues with roundtripping in various different text editors.


.NET Foundation Website | Blog | Projects | Code of Conduct