[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 support for Unix with APR?

From: Vincent Lefevre <vincent+svn_at_vinc17.org>
Date: Wed, 13 Feb 2008 15:21:30 +0100

On 2008-02-13 13:41:04 +0100, Erik Huelsmann wrote:
> No. The way (non-Mac) unices deal with this is seriously broken.

Yes, that's why there are workarounds.

> There is *no* guarantee the actual input paths are the encoding
> claimed by the locale settings.

Agreed. But one of the problems is that svn doesn't remember the
convention that has been chosen. Let's take an example. The user
has done a checkout under UTF-8 locales, and there was a file
called aé in the repository. The user can type "svn st", which
outputs nothing, as expected.

Now, for some reason, the user needs to use an ISO-8859-1 terminal
session. Then he cd's to the working copy and types "svn st", but
he gets:

vin:~tmp/wc> svn st
? aé
! aé

So, instead of just having a display problem under some cases, the
user has to face a much more important problem. One even gets a
cryptic error message with "svn up":

vin:~tmp/wc> svn up
svn: Can't copy '.svn/text-base/aé.svn-base' to '.svn/tmp/aé.tmp.tmp': Success
zsh: exit 1 svn up

It is annoying to have such problems even though the user doesn't
manipulate non-ASCII characters himself ("svn st" and "svn up" are
commands using plain ASCII).

Note: Using a wrapper to start svn in UTF-8 locales would avoid these
problems, but this would also add other problems, e.g. error messages
(in non-English language) would be output with an incorrect encoding
to the terminal. And unfortunately, the user has currently no way to
tell svn to use some encoding for the filenames (in the file system)
and some other encoding for the output.

> There is no way for APR to solve that issue. The only thing it can
> do is tell the application which input it should expect.

Note that in the above example, all input is in plain ASCII. So, the
problem is more than the encoding of the input.

FYI, because of the use of various locales and various OS (Linux,
which doesn't do any normalization, and Mac OS X, which has chosen
NFD), I have personally chosen not to use non-ASCII characters in my
filenames. But I expect svn to behave in a sensible way when dealing
with non-ASCII characters in filenames created by other people, at
least when I don't use these filenames directly (e.g. with "svn st"
and "svn up" like in my example above).

-- 
Vincent Lefèvre <vincent@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-13 15:21:52 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.