[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Problems with accents in filenames

From: Vincent Lefevre <vincent+svn_at_vinc17.org>
Date: 2003-11-24 16:03:39 CET

On 2003-11-24 14:22:49 +0000, Philip Martin wrote:
> Vincent Lefevre <vincent+svn@vinc17.org> writes:
> I note you avoided my question again:
> You keep avoiding my question. I have "foo\xe9" on disk. I want to
> use "foo\xe9" as a filename. How do I do that if you impose UTF-8?

Only the user can answer and decide what to do. With a UTF-8 encoding,
you shouldn't have a "foo\xe9" filename. A solution is to convert it
to UTF-8 (ROX-Filer can do that automatically, AFAIK).

In the same way, if you want to use "foo\xe9" as a filename in a
Subversion repository, you'll have a problem, if I've understood

> > the G_BROKEN_FILENAMES option. See
> >
> > http://www.gtk.org/gtk-2.2.0-notes.html
> >
> > I quote:
> >
> > * The assumption of GLib and GTK+ by default is that filenames on the
> > filesystem are encoded in UTF-8 rather than the encoding of the locale;
> > The GTK+ developers consider that having filenames whose interpretation
> > depends on the current locale is fundamentally a bad idea.
> That must be it, although it appears to contradict the API documentation
> http://developer.gnome.org/doc/API/2.2/gtk/GtkFileSelection.html#gtk-file-selection-get-filename
> http://developer.gnome.org/doc/API/2.2/gtk/GtkFileSelection.html#gtk-file-selection-set-filename

I don't see any contradiction. In general, it will be an identity
conversion (i.e. no conversion). I think that this is mainly for
filesystems that use another fixed encoding (some CD-ROM's?). Or
perhaps when there are "broken" filenames.

> I guess the API documentation is out of date? Does KDE do the same,
> or are GNOME and KDE now incompatible (perhaps all the developers
> speak ASCII and/or UTF-8 and haven't noticed)?

I don't know what KDE does, but a quick search on Google gave me:

  I'm not aware of any recent-file-spec or KDE implementing it, but
  in general KDE converts filenames from locale-encoding to 16-bit
  unicode which is used internally, and typically stored on disk as

> Doesn't having UTF-8 filenames with file contents in some other
> encoding cause lots of problems on the command line?

Well, I don't know. I think users will work in a way not to break
things (this is what I do).

> What happens when one wants to put filenames inside files?

Files can have multiple encodings, completely independent from the
current locale (e.g. in XML files). And XSLT processors will output
the results in an encoding that doesn't depend on the locale. So,
the problem isn't simple and depends on the context.

> I'm a native English speaker and I rarely stray outside ASCII, but
> what about others, CJK users say? Do Chinese users want UTF-8 encoded
> filenames or do they want a GB18030 encoding? Why is respecting their
> choice by following their locale "fundamentally a bad idea"? It's
> possible that most western languages are going to be UTF-8 encoded in
> the future (on Unix anyway) but it's not clear that all languages will
> be UTF-8.

We need to make a choice. Subversion uses UTF-8 internally, for a good

Some languages can be encoded in UTF-16, which is somewhat compatible
with UTF-8, so no problem here.

Vincent Lefèvre <vincent_at_vinc17.org> - Web: <http://www.vinc17.org/> - 100%
validated (X)HTML - Acorn Risc PC, Yellow Pig 17, Championnat International
des Jeux Mathématiques et Logiques, TETRHEX, etc.
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 24 16:04:28 2003

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.