[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 conversion error

From: Balázs Szabó <dlux_at_dlux.hu>
Date: 2006-04-23 00:02:31 CEST

Hi,

The defect I opened for this issue is here: http://
subversion.tigris.org/issues/show_bug.cgi?id=2464

Regards,

Balázs Szabó (dLux)
http://www.dlux.hu
-- -- - - - -- -

On 2006.04.22., at 14:14, Ryan Schmidt wrote:

> On Apr 21, 2006, at 21:03, Aaron Montgomery wrote:
>
>>> I can now check out the file from the subversion repository, but
>>> when I run "svn status" in the directory where the files resides
>>> I get:
>>>
>>> ? Wien 05_12_19 - ÖVG.doc
>>> ! Wien 05_12_19 - ÖVG.doc
>>>
>>> The file is reported as both "not under version control" and
>>> "missing". How can that be?
>>
>> I work on a text editor for Mac OS X and I know that we've had
>> problems because of the way the system handles decomposed vs. non-
>> decomposed Unicode characters. It is possible that SVN is
>> expecting to find an decomposed Ö and you've got a non-decomposed
>> Ö in the name of the file sitting in the directory. I'm not sure
>> the best way to handle this. Mac OS is not very well behaved since
>> its decision on how to represent UTF-8 is not the standard one (I
>> think the standard says that you should use the shortest encoding
>> and Mac OS X prefers to always use decomposed characters, but I'm
>> not really sure). Possibly setting everything to ISO-8859 might
>> solve this problem.
>
> Yes, we had an extensive thread on this problem in December:
>
> http://svn.haxx.se/users/archive-2005-12/0191.shtml
>
> To summarize:
>
> * Accented and umlauted characters have multiple valid
> representations in UTF-8: "composed" (for example LATIN CAPITAL
> LETTER O WITH DIAERESIS (U+00D6)) and "decomposed" (LATIN CAPITAL
> LETTER O (U+004F) followed by COMBINING DIAERESIS (U+0308)).
>
> * The Mac's usual HFS+ filesystem canonicalizes UTF-8 strings to
> the decomposed form.
>
> * The usual Windows and Linux filesystems, and the Subversion
> filesystem, do not canonicalize, meaning, infuriatingly, you can
> have two distinct files in these filesystems named, for example,
> "Wien 05_12_19 - ÖVG.doc"
>
> * It seems that if you create such a filename on Windows or Linux,
> you end up with the composed form.
>
> The upshot of all this is that if you create a filename with such
> characters on Linux or Windows and commit it to a Subversion
> repository, you cannot use that file if you check out the working
> copy on Mac OS X. And that bites.
>
>
> The proof is in the following pudding:
>
> On the Linux machine (Subversion 1.2.3 client and server):
>
> linux$ mkdir blöd
> linux$ svn import blöd https://server/repo/bl%f6d -m ""
> Committed revision 1.
>
> On the Mac[1] (Subversion 1.3.1 client connecting to Linux 1.2.3
> server):
>
> mac$ svn co https://server/repo
> A repo/blöd
> Checked out revision 1.
> mac$ svn st repo
> ? blo¨d
> ! blöd
>
> Note that in my terminal it's even shown that way: the file with
> decomposed characters (the way HFS+ canonicalized it) is
> unversioned, and the file with composed characters (the one
> Subversion was expecting) is missing.
>
>
> My suggestion would be that Subversion should
>
> * permit only a single form of a filename in the repository,
> possibly canonicalized using stringprep, and
>
> * for operations like "svn status", use stringprep to canonicalize
> filenames provided by the client filesystem before comparing them
> to the (already-stringprepped?) filenames in the files within
> the .svn directory.
>
>
> Balázs Szabó asked if this could be opened as a bug:
>
> http://svn.haxx.se/users/archive-2005-12/0386.shtml
>
> ...but nobody answered this question and I cannot see such a bug
> filed. I'll ask it again: anybody have any objection to this being
> finally filed as a bug?
>
>
> [1] That was with $LANG set to en_US.UTF-8 on the Mac. With $LANG
> set to en_US.ISO8859-1, which is what I usually use, I can't check
> it out at all:
>
> mac$ svn co https://server/repo
> svn: Can't check path 'repo/blöd': Invalid argument
>
> Separate bug?
>
>
>
Received on Mon Apr 24 19:35:34 2006

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.