>>>>> "Stuart" == Stuart Celarier <SCelarier@corillian.com> writes:
Stuart> Paul wrote:
>> One might argue this is a Windows bug -- it shouldn't allow two
>> names
Stuart> that
>> produce the same pixels on the screen but have different encoding
>> -- and that no doubt is why OS X doesn't.
Stuart> [Stuart] I've got to question this specific line of
Stuart> reasoning. If your criterion for distinctiveness is "same
Stuart> pixels on the screen", then, at best, that's an issue to take
Stuart> up with font designers. But that
Stuart> Consider this. U+0391 (Α) is a capital Greek alpha,
Stuart> which in virtually every font is visually indistinguishable
Stuart> from A the English letter A. Here's a simple HTML
Stuart> document, save it, load it in a browser, and see for
Stuart> yourself:
Stuart> <HTML><BODY>AΑ</BODY></HTML>
Stuart> These are distinct code points, hence different characters,
Stuart> even if similar or identical glyphs are used. I don't get how
Stuart> this becomes a Windows (or any other operating
Stuart> system-specific) problem. If the letter 'O' and numeral '0'
Stuart> are visually indistinct on my computer, in whatever font I
Stuart> happen to use, should the file system prevent me from using
Stuart> one of these characters? I don't think so.
Good point. I said it wrong in the previous note.
The Unicode standard, I believe, discusses the point you made about
meaning vs. appearance. It assigns code points to characters based on
what they mean, not based on what they look like. (Well, the
"unified" codes may stretch that rule...)
The right way to describe the issue is like this:
There is a character "Latin small letter a with acute". It looks like
this: "á". There are two ways to encode that character: as a
"combined" character, and as "a" followed by "combining acute accent".
Those two have the same meaning (not just the same appearance) --
which is really the important point. There are transformation
algorithms that recognize their equivalence. If you convert them to
any of the various Normalization Forms, you'll end up with the same
string for both (that's what "normalization" means).
See www.unicode.org/reports/tr15 for the full story.
paul
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Dec 7 15:28:20 2005