[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Encoding problems in subversion under Mac OS X (HFS+)

From: Paul Koning <pkoning_at_equallogic.com>
Date: 2005-12-06 22:45:19 CET

>>>>> "Balázs" == Balázs Szab <Bal> writes:

 Balázs> I did some research:

 Balázs> http://developer.apple.com/technotes/tn/tn1150.html

 Balázs> "HFS Plus stores strings fully decomposed and in canonical
 Balázs> order. HFS Plus compares strings in a case-insensitive
 Balázs> fashion. Strings may contain Unicode characters that must be
 Balázs> ignored by this comparison. For more details on these
 Balázs> subtleties, see Unicode Subtleties." ...

 Balázs> I am now sure that this is basically a compatibility problem
 Balázs> between SVN and OSX.

It's a bit more than that. In some ways, it's analogous to the case
insensitive file systems issue.

Windows allows file names to be encoded any old way. You can create
á.txt twice -- once with a composed á, once with a decomposed á. You
can then commit both. Linux will happily deal with those two files as
distinct files, too.

OS X objects to this. It decomposes all file names, so the composed
and decomposed names conflict. That's very similar to the conflict
between a.txt and A.txt in a case insensitive file system (and in fact
the error messages look similar).

So, contrary to what I suggested before, making all filenames
canonical (decomposed, for example) in Subversion is not necessarily
the right answer, because then you can't handle those two identical
looking but differently encoded file names in Windows. (One might
argue this is a Windows bug -- it shouldn't allow two names that
produce the same pixels on the screen but have different encoding --
and that no doubt is why OS X doesn't. But they are two permitted
file names, so the counter-argument is that the version control system
should allow them both.)

If the file names on the server are maintained as they were on the
client that originally created them, then the fix has to be in the OS
X client. It would have to keep the original file name encodings in
its .svn/entries file. But when comparing those names against
filenames returned by the file system -- i.e., for commands like "svn
status", it has to run those names through the decomposition algorithm
so they will match the names the file system has.

By the way, there's a long discussion about decomposition on the
Unicode website. Alternatively, a good start would be to implement
the mapping table in
http://developer.apple.com/technotes/tn/tn1150table.html . (That does
the decomposing; I don't think it describes the reordering of multiple
accent marks into their canonical order, but that seems like a less
critical issue, except perhaps for Vietnamese.)

         paul

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Dec 6 22:48:39 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.