[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn checkout - special characters in file name are not encoding properly

From: Stefan Sperling <stsp_at_elego.de>
Date: Tue, 10 Aug 2010 17:42:57 +0200

On Tue, Aug 10, 2010 at 04:59:58PM +0200, Vincent Lefevre wrote:
> On 2010-08-09 19:30:00 +0300, Daniel Shahaf wrote:
> > In the repository filesystem, we use UTF-8 exclusively. APR handles
> > translating that UTF-8 to whatever the local OS supports.
>
> Which is meaningless, since under Unix, the locale is not related
> to the OS, but to the process: one can have a shell session with
> UTF-8 locales and another shell session with ISO-8859-* locales.

I don't understand your point.

The repository uses UTF-8 internally regardless of the locale of the
server process. mod_dav_svn actually runs in the "C" locale because the
httpd server does not propagate locale information to its modules for
"security reasons". mod_dav_svn still receives all filenames from the
client encoded in UTF-8.

The locale only matters when data is presented to the user (by the svn
client, or svnlook, or svnadmin, ...) in which case Subversion uses iconv
to translate the UTF-8 data into the character set of the current locale.
If that does not work, an error message is printed.

> Unfortunately the svn client doesn't remember which one was used
> in the first place. The consequence is that if the user works
> with different locales, things go wrong (even if the user doesn't
> execute any command with non-ASCII characters in its arguments).

AFAIK there is no standard mechanism on UNIX for telling a process about
filename encodings. Filenames are just byte sequences. It's up to the
application to present the byte sequence to the user in a meaningful way.

One way of doing it is using assuming the character set of the current
locale and hope that this will work. That of course breaks down when people
try to work with the same set of files in locales using different character
sets (like latin1 vs. UTF-8). E.g. you can't check out a working copy using
a UTF-8 locale, and then use it with an svn client in a latin1 locale, and
expect things to just work, if you have filenames in the repository which
contain non-ASCII characters.

There are extensions in some systems like Linux, where filename encoding
can be specified at mount time and a process can query this information.
But the actual encoding of filenames might still differ (e.g. due to user
error). But more importantly since there is no common standard I don't
see how you'd solve this problem in a portable way.

Stefan
Received on 2010-08-10 17:43:43 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.