[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Check-out fails with LANG=C

From: Vincent Lefevre <vincent-svn_at_vinc17.net>
Date: Fri, 19 Jul 2013 15:22:33 +0200

On 2013-07-09 20:21:33 +0200, Branko Čibej wrote:
> Unlike on Windows and Mac OS (the latter at least with HFS+), the is no
> notion of native filesystem encoding on other Unix-like platforms. The
> best we can do is look at the locale settings, specifically, LC_CTYPE.

No, the best you can do is to let the user choose. LC_CTYPE typically
specifies the encoding used by the *terminal*, and this encoding may
change when the user connects by SSH from a terminal with a different

> I posit that if the "native encoding" is supposed to be UTF-8, then it
> is an error to use LANG=C at all. Instead, one should use LANG=C.UTF-8.

LANG=C.UTF-8 is completely non-portable for scripts. For instance:

xvii:~> LANG=C.UTF-8 cp
cp: opérande de fichier manquant
Saisissez « cp --help » pour plus d'informations.

xvii:~> LANG=C cp
cp: missing file operand
Try 'cp --help' for more information.

A script that needs to work in some well-defined way, in particular
with English messages (if they need to be parsed), must use the C
(or POSIX) locale. With most tools, this is fine as they don't need
to know how filenames are encoded.

> In a context where, for example, most files were encoded in Big5
> (http://en.wikipedia.org/wiki/Big5) — not a too far-fetched proposition
> — it would be slightly insane, to put it mildly, for Subversion to
> assume it can just write UTF-8 to disk.

Users who want UTF-8 on disk could choose UTF-8 in a config file.
Users who want Big5 on disk could choose Big5 in a config file.
There should also be a way to have ASCII encoding (like what is
done for URL's), for users who want things to work in every context
with the possibly-minor drawback of having some filenames that are
hardly readable with basic tools.

> So indeed, this state of affairs puts the burden of setting up their
> locale correctly on users, but that's simply the way Unix works.

No, according to POSIX, a filename just consists of a sequence of
bytes. How to interpret it is what *you* choose.

Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Received on 2013-07-19 15:23:10 CEST

This is an archived mail posted to the Subversion Users mailing list.