[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Problems with accents in filenames

From: Branko Čibej <brane_at_xbc.nu>
Date: 2003-11-24 02:10:08 CET

Vincent Lefevre wrote:

>On 2003-11-23 23:38:53 +0100, Branko ??ibej wrote:
>>There is no convention on Unix about how the filesystem encodes file
>>names. Typically, the bytes that an application send to the VFS layer
>>are what gets written to disk. And the exact encoding of the characters
>>in a file name depends on the current locale.
>No, it depends on what the software gives to open(2).
That's the entry point to the VFS layer.

> If the software
>chooses to encode the filename in UTF-8 (even if the current locales
>are not UTF-8 ones), you'll get UTF-8 encoding on the file system.
O.K., I'll concede that. I should have said,
"... the exact encoding of the characters in a file name usually depends
on the current locale."

>>Yes, different users can use different locales, and the same user can
>>use different locales at different times. As you noticed, this will
>>typically cause problems if the locales used are incompatible (such as
>>in your example, where UTF-8 and ISO-8859-1 are incompatible for code
>>positions above \x7f).
>No, this won't cause any problem if the UTF-8 encoding is always chosen.
My point is that *it is not*. Most Unix applicaitons do *not* send UTF-8
encoded file names to the filesystem, they use whatever the current
locale prescribes.

>>The bytes stored in the directory structure on disk are always
>>interpreted in terms of the current locale settings.
>This is a contradiction with your first sentence saying that there is
>no convention about how the filesystem encodes file names.
No it isn't. I said that the filesystem does not interpret the the file
name encoding, and that applications interpret the bytes they get from
the filesystem in terms of the current locale. At least, most
applications do.

>>This is what you're seing, and it's not a Subversion bug, it's a fact of
>>life on Unix.
>It is an inconsistency, therefore a Subversion bug (even if Subversion
>chooses a local encoding -- for instance, in this case, the encoding
>could be written somewhere in the .svn directory, and there would be
>no bug).
No, that would be an even worse bug. Imagine what happens if your shell
runs in UTF-8 and Subversion decides its working copy is in Shift-JIS.

>>I agree it would be nice if Unix file systems stored character data in a
>>consistent encoding (such as, for example, NTFS on Windows, which uses
>>UTF-16), but things simply don't work that way. Please stop trying to
>>convince us that they do.
>ROX-Filer (and GNOME applications) chose to encode the filenames in
>UTF-8. If you are not convinced, try...
GNOME is not the filesystem. ROX-Filer is not the filesystem. Subversion
isn't a GNOME application.

A GNOME-aware client for Subversion would probably adopt the GNOME
convention -- if, indeed, it is a convention and not just something
you're seeing because your GNOME desktop happens to be running in a
UTF-8 locale.

Let me put it simply. You're having problems because you insist on using
several different, incompatible locales to manipulate the same files.
Now, you may believe that the effects of your inconsistent environment
are a Subversion bug. They are *not*.

Frankly, I see no further purpose in this conversation, so it's over as
far as I'm concerned. My patience has run out.

Brane Čibej   <brane_at_xbc.nu>   http://www.xbc.nu/brane/
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 24 02:08:36 2003

This is an archived mail posted to the Subversion Dev mailing list.