[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Issue with UTF-8 filenames

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: Thu, 20 Mar 2008 09:23:49 +0100

On 3/20/08, Ryan Schmidt <subversion-2008a_at_ryandesign.com> wrote:
> On Mar 19, 2008, at 11:40, Etienne Miret wrote:
>
> > I've made an import on a svn repository with my locale incorrectly
> > set to 'fr_FR', which led it to interpret my filenames as ISO-
> > Latin-1, although they were UTF-8. Hence, the names are currently
> > stored in my repository in double UTF-8.
> >
> > After (correctly) setting the locale to 'fr_FR.UTF-8', I ran 'svn
> > status' on my working directory, and got exactly the result I
> > expected:
> > $ svn status
> > ? Impérialisme
> > ! ImpeÌ rialisme
> > The files with the wrong name is reported missing, and the one with
> > the correct name is reported not to be versioned.
> >
> > Now I intended to delete my file, and correct the name by a 'svn
> > update' followed by a 'svn move'. However :
> > $ svn update
> > A ImpeÌ rialisme
> >
> > $ svn status
> > ? ImpeÌ rialisme
> > ? Impérialisme
> > ! ImpeÌ rialisme
> >
> > $ rm Impérialisme
> >
> > $ svn mv ImpeÌ rialisme Impérialisme
> > A Impérialisme
> > svn: Working copy 'ImpeÌ rialisme' locked
> > svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for
> > details)
> >
> > $ svn status
> > ? ImpeÌ rialisme
> > ? Impérialisme
> > ! + Impérialisme
> > ! ImpeÌ rialisme
> > Obviously 'svn' doesn't correctly compares UTF-8 strings. The
> > issue seems to be that there are several codes for the same
> > character. For example 'é' can be 0xC3A9 (LATIN SMALL LETTER E WITH
> > ACUTE) or 0x65CC81 (LATIN SMALL LETTER E + COMBINING ACUTE ACCENT).
> > Unfortunately, I wasn't lucky enough for subversion and my OS to
> > always use the same form.
> >
> > I'm running subversion 1.4.4 on Mac OS X 10.5.2.
> >
> > Is this a known bug, and is there any workaround?
>
> Sounds like this bug, which is indeed a bigger problem for Mac users
> (specifically users of the Mac OS Extended filesystem):
>
> http://subversion.tigris.org/issues/show_bug.cgi?id=2464
>
> There even appears to be a patch.

The problem Etienne describes is - although related - a bit different
than the one described above.

What's happening is this:
* Subversion asks APR in what encoding it can expect FS input to be
* APR answers (on all *nixy systems): Look at the locale
* Subversion sees FR_fr (which uses iso-8859-1 as its default)
* Subversion uses iso-8859-1

What should happen:
* Subversion asks APR in what encoding it can expect FS input
* APR answers: UTF-8 (because that's what Mac OSX FS api defines)
* Subversion uses UTF-8

This issue is actually fixed in recent APR versions (0.9.x as well as
1.2.x), so if you got your binary from a pre-built source, please ask
them to start building against the newest APR patch release of their
prefered minor version.

HTH,

Erik.
Received on 2008-03-20 09:24:14 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.