[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Check-out fails with LANG=C

From: Vincent Lefevre <vincent-svn_at_vinc17.net>
Date: Wed, 31 Jul 2013 15:49:17 +0200

On 2013-07-24 05:57:41 +0200, Branko Čibej wrote:
> On 19.07.2013 15:22, Vincent Lefevre wrote:
> > LANG=C.UTF-8 is completely non-portable for scripts. For instance:
> >
> > xvii:~> LANG=C.UTF-8 cp
> > cp: opérande de fichier manquant
> > Saisissez « cp --help » pour plus d'informations.
> >
> > xvii:~> LANG=C cp
> > cp: missing file operand
> > Try 'cp --help' for more information.
> >
> > A script that needs to work in some well-defined way, in particular
> > with English messages (if they need to be parsed), must use the C
> > (or POSIX) locale. With most tools, this is fine as they don't need
> > to know how filenames are encoded.
> Frankly I'm not interested in portable scripts.

You may not be interested, but there are users who are.

> All you're showing above is that on your particular system, setting
> LANG=C.UTF-8 doesn't do anything. So perhaps you'll have to use

No, even "LC_ALL=en_US.UTF-8 cp" doesn't have any effect.

> or whatever happens to work on your particular flavour of Unix-like
> OS.

What is needed: LANG=C or LC_ALL=C, but that's not UTF-8, hence
problems with Subversion.

A workaround that can work: LANG=C, LC_CTYPE=C.UTF-8, and unset
all the other LC_* environment variables. And make sure that if
the program has output to a terminal, then either the charmap
was UTF-8 at the beginning or redirect all output so that the
terminal won't get UTF-8 (otherwise some output bytes can trash
the terminal). It's still a problem for interactive use of the
"svn" command from a shell (unless wrappers are used to convert
stdout and stderr... with known synchronization problems due to
different buffering behavior though svn may not be affected).

> All this is beside the point. The point is that it it not up to
> Subversion to invent a new way of dealing with file-name encodings.

Well, Subversion did invent one. Most programs don't have to deal with
filename encodings; in particular, using LC_ALL=C isn't much a problem
to deal with filenames with top-bit-set bytes. For instance, doing a
"mv dir1/* dir2" will work in any locale, even if dir1 has filenames
with top-bit-set bytes. Then you have GNOME applications that assume
that every filename is encoded in UTF-8, whatever the default locale
for the user. Then you have Subversion that assumes that filenames are
encoded in the current locale, breaking simple things like "svn up" if
the user has changed the locale.

> We use setlocale(LC_ALL, ""), this is the API that POSIX gives us
> and there is no other that I'm aware of.

But POSIX does not say that filenames should be considered to be
encoded with the corresponding charmap on the disk. For POSIX, a
filename remains a sequence of bytes. Any interpretation of these
bytes is unspecified.

> And we're certainly not going to break every working copy in
> existence by changing the way we transcode file names on Unix
> (except Mac OS, which is always UTF-8 anyway).

I'm certainly not asking to break every working copy in existence.
I'm just asking for an *option* to allow Subversion to remember the
encoding that was used at the creation of the working copy, and use
this information in subsequent operations. Or something similar.

Note that:

* Users who always use the same locale (at least the same charmap)
  wouldn't be affected at all, whether they use such an option or

* For users affected by locale change (either explicitly by the user
  or in more hidden ways, e.g. because the user has logged in from a
  different machine, whose terminal uses a different charmap):

  - With the current behavior of Subversion (or if the user doesn't
    use the proposed option), the working copy is regarded as broken
    by Subversion if some filenames have non-ASCII characters.
    A write operation such as "svn up" will break it even more.

  - With the proposed option, the working copy wouldn't break, e.g.
    "svn st" and "svn up" would be fine.

FYI, some other VCS such as git or Mercurial don't have such problems
of broken working copies if the locale changes, at least under Unix,
probably because they regard a filename just as a sequence of bytes.
The byte-sequence interpretation under Unix is a problem introduced
by Subversion, currently out of the scope of the POSIX API.

> I'll also point out that if you /need/ consistent, parseable output in
> scripts, the command-line client already provides an --xml flag.

Not all svn commands have a --xml flag. For instance:

$ svn up --xml
Subcommand 'update' doesn't accept option '--xml'
Type 'svn help update' for usage.

Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Received on 2013-07-31 15:50:19 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.