[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Evil UTF-8 Character in filename in repo causing issues on my wc

From: Ryan Schmidt <subversion-2011a_at_ryandesign.com>
Date: Wed, 15 Jun 2011 01:39:30 -0500

On Jun 14, 2011, at 18:59, Stefan Sperling wrote:

> On Tue, Jun 14, 2011 at 04:24:46PM -0700, Geoff Hoffman wrote:
>> I have a file with some (I believe) Portuguese characters in the filename
>> that someone managed to store in the repo without any problem, and I checked
>> it out without issues, too. However, now on my working copy, it thinks that
>> file is locally new.
>
>> MacbookPro:ClearSale geoffh$ ls -la
> ^^^
>
> It's a Mac, so please see this issue:
> http://subversion.tigris.org/issues/show_bug.cgi?id=2464
> and make sure to read the notes in this file:
> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
>
> Short summary:
> Do not use anything but ASCII in your filenames if you need things
> to work between Macs and other systems. The problem is that the Mac
> changes the filename in a subtle way.

I would clarify this by saying the problem is that Subversion assumes that a filename submitted in one version of UTF-8 encoding will always stay in that version of UTF-8 encoding, and on the HFS+ filesystem, used by Mac OS X, that assumption is not necessarily true. (It normalizes all UTF-8 filenames to decomposed form.) Subversion would happily allow you to create two filenames that humans would consider identical (one with UTF-8 entities composed, one with UTF-8 entities decomposed). So clearly that's a bug in Subversion (or possibly apr or apr-util); it should normalize UTF-8 strings before running comparisons. It also seems like a bug in Windows and Linux filesystems; I assume they also let you create multiple files whose names look identical (but differ only in the composition of their UTF-8 characters). Mac OS X's is the only filesystem I know of that has fixed this bug -- which therefore exposes the problem when collaborating between Mac OS X systems (which have the fix) and other systems (which do n
ot).

Using only ASCII characters in your filenames is one way to combat the problem. This strategy works fine for me, but users not using primarily English might find that harder. If you want to continue using UTF-8 characters in filenames, you can get a version of Subversion for Mac OS X that attempts to work around this problem, by installing MacPorts and then running:

sudo port install subversion +unicode_path

The patch the +unicode_path variant applies is of course not officially supported.
Received on 2011-06-15 08:40:19 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.