[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Umlaut problem on Mac (composed vs. decomposed UTF-8)

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: 2007-07-23 12:20:33 CEST

On 7/23/07, Matthias Wächter <matthias.waechter@tttech.com> wrote:
> On 21.07.2007 01:42, Daniel A. Steffen wrote:

[ About normalizing paths and the ]

> > A possible alternative approach to fix this problem could thus be to
> > change the 'name' field of entries records to the working copy name, and
> > to store the repository name in a new field (or possibly the existing
> > 'url' field).
> >
> > This has the advantage that there is no need for subversion to
> > know/implement how a given filesystem might transform filenames (i.e. no
> > platform or ICU dependency etc), as that info can be obtained by
> > creating a file with the name in question in an empty temp dir followed
> > by listing of dir contents.

At face value this looks like a solution, but there's one huge problem
with it: performance. This solution will cause a huge amount of
additional FS calls. There's also another problem: Network filesystems
usually cache (for a short period of time) data written to the network
drive. If that network changes the normal form (such as HFS+), reading
the dir content won't help: we'll get the value from the cache.

> Right. Keepling a local 'matching table' between repository vs.
> local file names could also be a solution for Windows users that are
> busted with repositories containing file with the same name, once
> lower case, once upper case.

This won't help: in the light of network mounts/drives, you can't be
sure a drive on Windows is a Windows filesystem... You could be
writing to an HFS+ drive.

Treating this problem as a case-sensitivity issue is not really fair
to the problem: there are 2 file names which mean exactly the same
thing. While with case sensitivity users can actually *see* the
difference between path names, here, it's not the case. It is not even
*meant* to be the case: Unicode assigns the same meaning to "u" +
"last letter with umlaut" and "u with umlaut", it's only the binary
values that differ. Subversion should compensate for that and treat
the different values to mean the same thing.

> Then, one of these files could have a
> slightly different local file name, and both could be checked out,
> worked with etc.

Why would you want 2 files, one of which is called "&Auml;lter", the
other "A&uml;lter" and have them both be versioned?

> Certainly, Any reference to such files (e.g., in Makefiles) would
> not work if the local applications don't know when and how to
> convert between the stored and local file name.

> > I may well be overlooking something basic, but it seems to me that this
> > approach would be simpler, more robust & less restrictive than the
> > proposals involving normalization on some/all platforms and potential
> > repository changes.

So, what happens if a Linux and a Mac client both add the same file,
with an A-umlaut in it and then they both commit? Currently, they'll
both be added to the repository. The Linux client will end up with 2
files with the same name, the Mac client will not be able to update
anymore.

> I agree, forcing any normalization is _not_ Subversion's job,
> neither to NFC nor NFD. These should be client-level additions.

As long as this is hardly visible on the outside, why souldn't
Subversion standardize on one or the other In its own little world
(internally)?

bye,

Erik.
Received on Mon Jul 23 12:19:30 2007

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.