[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Evil UTF-8 Character in filename in repo causing issues on my wc

From: Vincent Lefevre <vincent-svn_at_vinc17.net>
Date: Thu, 23 Jun 2011 03:13:47 +0200

On 2011-06-22 16:28:31 +0200, Stefan Sperling wrote:
> On Wed, Jun 22, 2011 at 03:42:42PM +0200, Vincent Lefevre wrote:
> > On 2011-06-15 12:29:37 +0200, Stefan Sperling wrote:
> > > Unicode, and it's quirk of allowing the *same* character to be encoded
> > > in *different* ways, came much later.
> > >
> > > I think it is unfortunate that Apple broke with the concept that a
> > > filename is just a string of bytes.
> >
> > It's also unfortunate that Subversion breaks this concept too. :)
> >
> > I mean: do a checkout of a repository containing non-ASCII characters
> > under Linux. Then change the locales (e.g. ISO-8859-1 -> UTF-8). Do
> > an update. And see the errors...
> I don't agree that this is the same problem. It's a different problem.

I'm not saying that's the same problem, but that Subversion doesn't
regard a filename as a string of bytes.

> Subversion is internally converting path names from the native encoding

If you regard a filename as a string of bytes, there isn't a concept
of native encoding.

> into UTF-8 and sends them to the repository because they are UTF-8-encoded
> there. This way, all encodings used on client systems can be represented
> in the repository. It works fine with client systems that do not support
> UTF-8 natively at all, as long as they use some encoding that iconv
> understands. And this is all happening *within* the application.
> The rules that svn uses to create filenames are clear and consistent.

There aren't consistent, because svn doesn't track the encoding used
to create the filenames. GNOME rules are consistent: the encoding is
always UTF-8.

> They require users not to flip locales willy-nilly, but that's the
> tradeoff with relying on the locale. Locales only support one encoding
> at a time.

Yes, but different processes can use different locales, and this breaks
svn. There's a good reason why locales are set via environment variables
(on POSIX systems) and not globally.

> What apple does is transform the byte sequence behind the
> application's back.

This is not behind application's back, because this is documented in
the API. The application writer should follow the API.

What's more important is that both Mac OS X and svn (e.g. under Linux)
can transform the byte sequence in the *user's* back. For Mac OS X,
this is related to the normalization form, and for svn, this is related
to the locales.

> So the application itself cannot rely on its *own* rules it was using to
> create filenames when it runs again and reads the names back from disk
> because the OS is interfering with these rules.

I think it's great to have standards, system-wide conventions and things
like that to avoid applications choosing their own rules. So, I wouldn't
blame Mac OS X for that.

Because there are drawbacks to any choice regarding the filenames,
it would be better to make things configurable at the user level,
but hardcoding choices in applications is bad, IMHO.

Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)
Received on 2011-06-23 03:14:19 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.