[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Encoding problems in subversion under Mac OS X (HFS+)

From: Balázs Szabó <dlux_at_dlux.hu>
Date: 2005-12-06 23:31:54 CET

Hi,

On 2005.12.06., at 22:45, Paul Koning wrote:

> It's a bit more than that. In some ways, it's analogous to the case
> insensitive file systems issue.
>
> Windows allows file names to be encoded any old way. You can create
> á.txt twice -- once with a composed á, once with a decomposed á. You
> can then commit both. Linux will happily deal with those two files as
> distinct files, too.
>
> OS X objects to this. It decomposes all file names, so the composed
> and decomposed names conflict. That's very similar to the conflict
> between a.txt and A.txt in a case insensitive file system (and in fact
> the error messages look similar).
>
> So, contrary to what I suggested before, making all filenames
> canonical (decomposed, for example) in Subversion is not necessarily
> the right answer, because then you can't handle those two identical
> looking but differently encoded file names in Windows. (One might
> argue this is a Windows bug -- it shouldn't allow two names that
> produce the same pixels on the screen but have different encoding --
> and that no doubt is why OS X doesn't. But they are two permitted
> file names, so the counter-argument is that the version control system
> should allow them both.)
>
> If the file names on the server are maintained as they were on the
> client that originally created them, then the fix has to be in the OS
> X client. It would have to keep the original file name encodings in
> its .svn/entries file. But when comparing those names against
> filenames returned by the file system -- i.e., for commands like "svn
> status", it has to run those names through the decomposition algorithm
> so they will match the names the file system has.

Yes, it is a very good summary of the problem. I want to add that it
might not occur in other filesystems in OSX, only HFS+. So the client
does not only need to check that "OK, I am now in OSX, I will do this
conversion stuff", but needs some heuristics to find out if it
behaves in a way that is described earlier.

> By the way, there's a long discussion about decomposition on the
> Unicode website. Alternatively, a good start would be to implement
> the mapping table in
> http://developer.apple.com/technotes/tn/tn1150table.html . (That does
> the decomposing; I don't think it describes the reordering of multiple
> accent marks into their canonical order, but that seems like a less
> critical issue, except perhaps for Vietnamese.)

This is Mac OSX and especially HFS+-specific code.

The problem with this is the detection of this issue.

What I suggest is a configuration parameter to allow it to be set up
globally: should the SVN client convert every pathname to a common
canonical form before filename-comparisons (e.g. with stringprep), or
not.

I don't see too much sense allowing filenames containing the same
character to be encoded differently, but it might happen to be a
case. If this configuration parameter is set to "off", then it is
allowed.

In a multi-platform environment where it the system is used to store
different document titles, this configuration parameter should be set
to "on", and then the OSX and Windows/Linux/UNIX users will be happy
with that as well.

So what do you suggest? Should I open a bug for it in the SVN bug-
tracking system?

Regards,

Balázs Szabó (dLux)
-- -- - - - -- -

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Tue Dec 6 23:38:43 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.