[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Check-out fails with LANG=C

From: Branko ÄŒibej <brane_at_wandisco.com>
Date: Wed, 24 Jul 2013 05:57:41 +0200

On 19.07.2013 15:22, Vincent Lefevre wrote:
> On 2013-07-09 20:21:33 +0200, Branko ÄŒibej wrote:
>> Unlike on Windows and Mac OS (the latter at least with HFS+), the is no
>> notion of native filesystem encoding on other Unix-like platforms. The
>> best we can do is look at the locale settings, specifically, LC_CTYPE.
> No, the best you can do is to let the user choose. LC_CTYPE typically
> specifies the encoding used by the *terminal*, and this encoding may
> change when the user connects by SSH from a terminal with a different
> encoding.
>
>> I posit that if the "native encoding" is supposed to be UTF-8, then it
>> is an error to use LANG=C at all. Instead, one should use LANG=C.UTF-8.
> LANG=C.UTF-8 is completely non-portable for scripts. For instance:
>
> xvii:~> LANG=C.UTF-8 cp
> cp: opérande de fichier manquant
> Saisissez « cp --help » pour plus d'informations.
>
> xvii:~> LANG=C cp
> cp: missing file operand
> Try 'cp --help' for more information.
>
> A script that needs to work in some well-defined way, in particular
> with English messages (if they need to be parsed), must use the C
> (or POSIX) locale. With most tools, this is fine as they don't need
> to know how filenames are encoded.

Frankly I'm not interested in portable scripts. All you're showing above
is that on your particular system, setting LANG=C.UTF-8 doesn't do
anything. So perhaps you'll have to use LC_CTYPE=UTF-8,
LANG=en_US.UTF-8, or whatever happens to work on your particular flavour
of Unix-like OS.

All this is beside the point. The point is that it it not up to
Subversion to invent a new way of dealing with file-name encodings. We
use setlocale(LC_ALL, ""), this is the API that POSIX gives us and there
is no other that I'm aware of. And we're certainly not going to break
every working copy in existence by changing the way we transcode file
names on Unix (except Mac OS, which is always UTF-8 anyway).

I'll also point out that if you /need/ consistent, parseable output in
scripts, the command-line client already provides an --xml flag.

Sure, it would be nice if POSIX defined a portable way to consistently
determine file-name encoding, or even if there were reliable,
non-portable, OS-specific ways that we could use. But I'm not aware of any.

-- Brane

-- 
Branko ÄŒibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane_at_wandisco.com
Received on 2013-07-24 05:58:44 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.