[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Umlaut problem on Mac (composed vs. decomposed UTF-8)

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: 2007-07-17 16:46:06 CEST

On 7/17/07, David Glasser <glasser@mit.edu> wrote:
> On 7/17/07, Marc Haisenko <haisenko@comdasys.com> wrote:
> > Yes, you got it right, but I'll start over (but beware: I'm by no means an
> > expert; I hope I get it right).
> >
> > Suppose you want to encode the character "ΓΌ" (umlaut-u). You can do it in two
> > ways in Unicode: either use the "composed" form, which is just one character:
> > umlaut-u. Or you can use the "decomposed" form, which is two characters: the
> > first character means "add umlaut to the subsequent character" followed by a
> > plain latin "u". A normal strcmp will say that the two representations are
> > different because they are two different byte streams.
> >
> > Both forms have their advantages and disadvantages. Windows wants to store
> > Unicode filenames in "composed" form, while Mac OS X wants to store the
> > filenames in "decomposed" form. This leads to problems.
> >
> > There are various solutions to this problem, but they all more or less require
> > to have some "real" Unicode handling: either by having a strcmp that doesn't
> > complain when one string is composed and the other decomposed, or you
> > normalize each and every filename to an agreed-on representation. As far as I
> > can see the later is propably less error-prone (only few well-known "entries"
> > for filenames exist) and requires less code change. Especially if the
> > composed representation is used, as that already works on Windows and Linux,
> > so now "only" the Mac OS X client needs to normalize them as well.
>
> Ah, I see. So what are the actual effects? Is it something like:
>
> * Linux user adds a file with a :u in it, which is stored composed
> * Mac user checks it out
> * Mac user edits the file
> * Mac user tries to commit; the commit request sends the name with a
> decomposed :u
> * Repository has no idea what file the mac user's client is talking about

Worse: the Mac client reports the versioned file as missing
immediately after checkout *and* reports a file (which looks *exactly*
the same to the user) as unversioned.

There's no committing to a file like that...

bye,

Erik.
Received on Tue Jul 17 16:45:19 2007

This is an archived mail posted to the Subversion Dev mailing list.