[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Evil UTF-8 Character in filename in repo causing issues on my wc

From: Vincent Lefevre <vincent-svn_at_vinc17.net>
Date: Thu, 23 Jun 2011 03:34:02 +0200

On 2011-06-22 19:34:08 +0200, Stefan Sperling wrote:
> On Wed, Jun 22, 2011 at 07:09:22PM +0200, Andreas Krey wrote:
> > In my opinion it would be saner nowadays to assume file names to
> > be in utf8 and warn if they are not, and use the setting in LANG
> > for console I/O only.
>
> This strategy may work well for applications starting out today.
> but it won't work for Subversion.
>
> Not all operating systems have switched to UTF-8 as the default
> character set yet. ASCII is still the only encoding that works
> everywhere out of the box (especially on the console!).
> E.g. Debian switched to UTF-8 by default for the Etch release in 2008.
> http://www.debian.org/releases/etch/i386/release-notes/ch-whats-new.en.html
> Many unixy systems that aren't Linux have not switched to UTF-8 by
> default, and it is possible that some never will.

Debian still supports non-UTF-8 locales. That's useful when one
connects from a non-UTF-8 terminal with SSH. And that's precisely
why the user may need to use different locales on some machine
(just for consistent terminal I/O).

> Subversion is supposed to be portable across all these platforms and
> more.

Tracking the filename encoding or letting the user choose the filename
encoding wouldn't be against portability.

Also portable scripts need to use LC_ALL=C. And again, this breaks
svn as soon as a filename has non-ASCII characters (even though such
a filename doesn't appear anywhere in the svn arguments).

> I agree that locales aren't the ideal solution to this problem.
> But at least they are standardized by POSIX and can be expected
> to behave the same way everywhere.

Not everything is standardized (e.g. locale names and what they provide
are system specific). And under POSIX, each process can have its own
locale (and change it).

> And they allow Subversion users to say "yes, my system supports
> UTF-8, please use it".

But what if the system supports UTF-8, but the terminal doesn't?

> The best solution would be a standardised way of specifying
> filename encoding that works the same on all filesytem types in
> all operating systems. Alas, that doesn't exist :(

When there's no standard, let the user choose (with a good compromise
for the default behavior).

> I don't think the current solution is perfect. But it's a good
> compromise given the circumstances.

It really isn't. Tracking the filename encoding is a must.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)
Received on 2011-06-23 03:34:34 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.