[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Re: Encoding problems in subversion under Mac OS X (HFS+)

From: Paul Koning <pkoning_at_equallogic.com>
Date: 2005-12-07 15:24:59 CET

>>>>> "Stuart" == Stuart Celarier <SCelarier@corillian.com> writes:

 Stuart> Paul wrote:
>> One might argue this is a Windows bug -- it shouldn't allow two
>> names
 Stuart> that
>> produce the same pixels on the screen but have different encoding
>> -- and that no doubt is why OS X doesn't.

 Stuart> [Stuart] I've got to question this specific line of
 Stuart> reasoning. If your criterion for distinctiveness is "same
 Stuart> pixels on the screen", then, at best, that's an issue to take
 Stuart> up with font designers. But that

 Stuart> Consider this. U+0391 (&#913;) is a capital Greek alpha,
 Stuart> which in virtually every font is visually indistinguishable
 Stuart> from &#65; the English letter A. Here's a simple HTML
 Stuart> document, save it, load it in a browser, and see for
 Stuart> yourself:

 Stuart> <HTML><BODY>&#65;&#913;</BODY></HTML>

 Stuart> These are distinct code points, hence different characters,
 Stuart> even if similar or identical glyphs are used. I don't get how
 Stuart> this becomes a Windows (or any other operating
 Stuart> system-specific) problem. If the letter 'O' and numeral '0'
 Stuart> are visually indistinct on my computer, in whatever font I
 Stuart> happen to use, should the file system prevent me from using
 Stuart> one of these characters? I don't think so.

Good point. I said it wrong in the previous note.

The Unicode standard, I believe, discusses the point you made about
meaning vs. appearance. It assigns code points to characters based on
what they mean, not based on what they look like. (Well, the
"unified" codes may stretch that rule...)

The right way to describe the issue is like this:

There is a character "Latin small letter a with acute". It looks like
this: "". There are two ways to encode that character: as a
"combined" character, and as "a" followed by "combining acute accent".

Those two have the same meaning (not just the same appearance) --
which is really the important point. There are transformation
algorithms that recognize their equivalence. If you convert them to
any of the various Normalization Forms, you'll end up with the same
string for both (that's what "normalization" means).

See www.unicode.org/reports/tr15 for the full story.


To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Wed Dec 7 15:28:20 2005

This is an archived mail posted to the Subversion Users mailing list.