[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn checkout - special characters in file name are not encoding properly

From: Vincent Lefevre <vincent-svn_at_vinc17.net>
Date: Wed, 11 Aug 2010 16:26:32 +0200

On 2010-08-11 13:42:35 +0200, Stefan Sperling wrote:
> On Wed, Aug 11, 2010 at 12:31:48AM +0200, Vincent Lefevre wrote:
> > On 2010-08-10 20:59:00 +0200, Stefan Sperling wrote:
> > > Right now, if the filename cannot be represented in the current locale,
> > > you get this error: "svn: Can't convert string from 'UTF-8' to native encoding"
> >
> > which is bad and prevents users from writing POSIX-conforming scripts
> > using svn, i.e. under the POSIX locale (except on systems where the
> > POSIX locale uses UTF-8, but I don't know any).
>
> There's no reason your script could not configure a UTF-8 locale if that
> is needed to represent filenames which exist in the repository.

Configuring a UTF-8 locale can yield non-portable behavior.
There's a good reason why various scripts do a "LC_ALL=C".

Moreover there's no portable way to select a UTF-8 locale.

And the POSIX API doesn't need a UTF-8 locale to handle filenames
with top-bit-set bytes.

> We agree on the point that Subversion should use a single character
> set for all filenames in the same working copy.
> Because how should Subversion behave if some filenames convert fine to
> the current character set, and some do not? E.g. what if my encoding
> configuration setting is en_US.ISO8859-1? Should Subversion use ISO8859-1
> for some filenames, and UTF-8 for those which cannot be represented in
> ISO8859-1? That gets really confusing.
>
> It seems that this conversation leads to the question of why Subversion
> even bothers with checking the locale at all. It might as well always
> create filenames in UTF-8, and leave the user with apparently mangled
> filenames if they don't use a UTF-8 locale.
>
> But that isn't a solution either, because now you have lots of
> non-UTF-8 users complaining that Subversion cannot represent their
> filenames properly, where previously it worked fine.

That's why I suggested the encoding to be configurable.

> > It's not pointless, or at least, something else needs to be done.
> > Currently "svn up" fails to work, and that's a problem.
>
> It doesn't fail if locales are used consistently.

It fails even if locales are used consistently.

> I don't think this problem is specific to Subversion.

I haven't seen such problems with other tools.

> Other tools also suffer from the fact that POSIX doesn't specify a
> standard for defining filename encodings. Maybe we can find a good
> solution by looking around at how other tools handle this.

Most tools just ignore the encoding of filenames.

> However, I'd expect many will just assume that the user wants filenames
> to be encoded according to the current locale.
> If everybody follows this convention, there is no problem, apart from
> user errors during locale configuration.

You're asking the user, and even all users on the system where
the files are shared, to stick with a single locale. This is not
acceptable, this is contrary to POSIX requirements, and is also
a problem for SSH (where the user needs to use the same charset
on both sides). Under these conditions, the only possibility is
to encode the filenames in UTF-8 anyway. So, why not enforcing
that?

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)
Received on 2010-08-11 16:27:13 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.