[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn checkout - special characters in file name are not encoding properly

From: Stefan Sperling <stsp_at_elego.de>
Date: Wed, 11 Aug 2010 13:42:35 +0200

On Wed, Aug 11, 2010 at 12:31:48AM +0200, Vincent Lefevre wrote:
> On 2010-08-10 20:59:00 +0200, Stefan Sperling wrote:
> > Right now, if the filename cannot be represented in the current locale,
> > you get this error: "svn: Can't convert string from 'UTF-8' to native encoding"
>
> which is bad and prevents users from writing POSIX-conforming scripts
> using svn, i.e. under the POSIX locale (except on systems where the
> POSIX locale uses UTF-8, but I don't know any).

There's no reason your script could not configure a UTF-8 locale if that
is needed to represent filenames which exist in the repository.

> For filenames stored on disk, they (all of them) can be encoded using
> UTF-8. Remember, filenames on a POSIX system are just sequences of
> bytes. For what is output to the terminal, non-representable
> characters can be displayed by a replacement characters such as "?".
> This can still be better than an error.

Throwing an error is a straightforward way of solving the problem.

We agree on the point that Subversion should use a single character
set for all filenames in the same working copy.
Because how should Subversion behave if some filenames convert fine to
the current character set, and some do not? E.g. what if my encoding
configuration setting is en_US.ISO8859-1? Should Subversion use ISO8859-1
for some filenames, and UTF-8 for those which cannot be represented in
ISO8859-1? That gets really confusing.

It seems that this conversation leads to the question of why Subversion
even bothers with checking the locale at all. It might as well always
create filenames in UTF-8, and leave the user with apparently mangled
filenames if they don't use a UTF-8 locale.

But that isn't a solution either, because now you have lots of
non-UTF-8 users complaining that Subversion cannot represent their
filenames properly, where previously it worked fine.

> > see http://subversion.tigris.org/issues/show_bug.cgi?id=2464
>
> This problem is due to the fact that Subversion doesn't enforce a
> canonical representation (either NFC or NFD).

Yes. I just brought it up because it is related indirectly to this
discussion.

> > > 2. Use the current locale.
> >
> > That's what's being done. But we're not writing the information down in the
> > working copy meta data, and doing so is quite pointless as described above.
>
> It's not pointless, or at least, something else needs to be done.
> Currently "svn up" fails to work, and that's a problem.

It doesn't fail if locales are used consistently.
If locales aren't configured consistently, that's a user error.
That's the best we can do.

I don't think this problem is specific to Subversion.
Other tools also suffer from the fact that POSIX doesn't specify a
standard for defining filename encodings. Maybe we can find a good
solution by looking around at how other tools handle this.
However, I'd expect many will just assume that the user wants filenames
to be encoded according to the current locale.
If everybody follows this convention, there is no problem, apart from
user errors during locale configuration.

Stefan
Received on 2010-08-11 13:43:17 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.