On Apr 21, 2006, at 21:03, Aaron Montgomery wrote:
>> I can now check out the file from the subversion repository, but
>> when I run "svn status" in the directory where the files resides I
>> get:
>>
>> ? Wien 05_12_19 - ÖVG.doc
>> ! Wien 05_12_19 - ÖVG.doc
>>
>> The file is reported as both "not under version control" and
>> "missing". How can that be?
>
> I work on a text editor for Mac OS X and I know that we've had
> problems because of the way the system handles decomposed vs. non-
> decomposed Unicode characters. It is possible that SVN is expecting
> to find an decomposed Ö and you've got a non-decomposed Ö in the
> name of the file sitting in the directory. I'm not sure the best
> way to handle this. Mac OS is not very well behaved since its
> decision on how to represent UTF-8 is not the standard one (I think
> the standard says that you should use the shortest encoding and Mac
> OS X prefers to always use decomposed characters, but I'm not
> really sure). Possibly setting everything to ISO-8859 might solve
> this problem.
Yes, we had an extensive thread on this problem in December:
http://svn.haxx.se/users/archive-2005-12/0191.shtml
To summarize:
* Accented and umlauted characters have multiple valid
representations in UTF-8: "composed" (for example LATIN CAPITAL
LETTER O WITH DIAERESIS (U+00D6)) and "decomposed" (LATIN CAPITAL
LETTER O (U+004F) followed by COMBINING DIAERESIS (U+0308)).
* The Mac's usual HFS+ filesystem canonicalizes UTF-8 strings to the
decomposed form.
* The usual Windows and Linux filesystems, and the Subversion
filesystem, do not canonicalize, meaning, infuriatingly, you can have
two distinct files in these filesystems named, for example, "Wien
05_12_19 - ÖVG.doc"
* It seems that if you create such a filename on Windows or Linux,
you end up with the composed form.
The upshot of all this is that if you create a filename with such
characters on Linux or Windows and commit it to a Subversion
repository, you cannot use that file if you check out the working
copy on Mac OS X. And that bites.
The proof is in the following pudding:
On the Linux machine (Subversion 1.2.3 client and server):
linux$ mkdir blöd
linux$ svn import blöd https://server/repo/bl%f6d -m ""
Committed revision 1.
On the Mac[1] (Subversion 1.3.1 client connecting to Linux 1.2.3
server):
mac$ svn co https://server/repo
A repo/blöd
Checked out revision 1.
mac$ svn st repo
? blo¨d
! blöd
Note that in my terminal it's even shown that way: the file with
decomposed characters (the way HFS+ canonicalized it) is unversioned,
and the file with composed characters (the one Subversion was
expecting) is missing.
My suggestion would be that Subversion should
* permit only a single form of a filename in the repository, possibly
canonicalized using stringprep, and
* for operations like "svn status", use stringprep to canonicalize
filenames provided by the client filesystem before comparing them to
the (already-stringprepped?) filenames in the files within the .svn
directory.
Balázs Szabó asked if this could be opened as a bug:
http://svn.haxx.se/users/archive-2005-12/0386.shtml
...but nobody answered this question and I cannot see such a bug
filed. I'll ask it again: anybody have any objection to this being
finally filed as a bug?
[1] That was with $LANG set to en_US.UTF-8 on the Mac. With $LANG set
to en_US.ISO8859-1, which is what I usually use, I can't check it out
at all:
mac$ svn co https://server/repo
svn: Can't check path 'repo/blöd': Invalid argument
Separate bug?
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Sat Apr 22 14:16:09 2006