[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn checkout - special characters in file name are not encoding properly

From: Vincent Lefevre <vincent-svn_at_vinc17.net>
Date: Tue, 10 Aug 2010 19:44:35 +0200

On 2010-08-10 17:42:57 +0200, Stefan Sperling wrote:
> The locale only matters when data is presented to the user (by the svn
> client, or svnlook, or svnadmin, ...) in which case Subversion uses iconv
> to translate the UTF-8 data into the character set of the current locale.

The svn client also uses the locale for filename encoding.

> AFAIK there is no standard mechanism on UNIX for telling a process about
> filename encodings. Filenames are just byte sequences. It's up to the
> application to present the byte sequence to the user in a meaningful way.

Yes, however "meaningful" depends on what the user expects.

> One way of doing it is using assuming the character set of the current
> locale and hope that this will work. That of course breaks down when people
> try to work with the same set of files in locales using different character
> sets (like latin1 vs. UTF-8). E.g. you can't check out a working copy using
> a UTF-8 locale, and then use it with an svn client in a latin1 locale, and
> expect things to just work, if you have filenames in the repository which
> contain non-ASCII characters.

This is precisely the problem I've mentioned.

> There are extensions in some systems like Linux, where filename encoding
> can be specified at mount time and a process can query this information.
> But the actual encoding of filenames might still differ (e.g. due to user
> error). But more importantly since there is no common standard I don't
> see how you'd solve this problem in a portable way.

This is easy (at least from the specification point of view): once the
encoding has been determined[*], typically at checkout time, store the
encoding in the WC metadata (with the current WC layout, that would be
some file under the .svn directory), so that the next time the svn
client is used for this WC, the same encoding will be used, avoiding
inconsistencies (such as currently obtained by two "svn up" under two
different locales).

[*] There are several ways to do that, such as:
1. Use a charset specified by the user in the svn config file.
2. Use the current locale.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)
Received on 2010-08-10 19:45:15 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.