[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 support for Unix with APR?

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: Tue, 12 Feb 2008 14:02:33 +0100

On 2/12/08, Erik Huelsmann <ehuels_at_gmail.com> wrote:
> On 2/12/08, B. Blodau <b_blodau_at_hamburg.de> wrote:
> > I wrote a similar request for this topic earlier, but now it becomes
> > a more general issue.
> >
> > My questions are:
> > - Can subversion support utf8 filenames on Unix systems when using
> > the apr libraries?
> > - Has anybody used the C-Libraries on a unix system (including MacOS
> > X) and successfully used international pathnames?
> >
> > I'm writing a C++ appliation which uses the svn libraries and
> > therefore inherits the apr libraries too.
> > When working with international filenames I'm getting errors that
> > characters could not be converted from utf8 to the local encoding.
> >
> > Since my app is a Unicode safe application I don't want the filenames
> > be converted, because I want to keep the whole Unicode character set.
> > Even a successful conversion to the encoding of the current user
> > locale, would result in a limited character set.
> >
> > When debugging this a bit futher I came to the follwoing code snippet in
> > ".../apr/file_io/unix/filepath.c":
> >
> > APR_DECLARE(apr_status_t) apr_filepath_encoding(int *style,
> > apr_pool_t *p)
> > {
> > *style = APR_FILEPATH_ENCODING_LOCALE;
> > return APR_SUCCESS;
> > }
> >
> > This looks as if - at least for Unix - no utf8 support is intended.
> > Otherwise this function should return APR_FILEPATH_ENCODING_UTF8.
>
> The APR libraries handle file paths in the system locale. This means
> they *may* be encoded in UTF-8, but are not necessarily. When they are
> interpreted as UTF-8 depends on the LANG or LC_CTYPE settings in the
> host environment.
>
> LANG=en_US.UTF-8
>
> will indicate UTF-8 pathnames. OTOH,
>
> LANG=en_US.iso8859-1
>
> will indicate "latin1" pathnames. The application is free to do with
> that information whatever it wants. Subversion uses the returned value
> to determine whether there's any "locale"->"UTF8" conversion
> necessary, since internally it entirely uses UTF8 encoded pathnames.
>
> > Can anybody confirm my concern? I just don't want to search for a
> > solution where I don't have any chance.
>
> If you encounter conversion problems, chances are you didn't provide
> any information to your application regarding the locale settings to
> be used: only when you do that, then will Subversion know what the
> source encoding to be used is (and it won't convert if the source is
> already considered UTF8). Did you call setlocale() (the C library
> function)?

BTW, APR supports the UTF-8 encoding of filepaths as system standard
operation on MacOSX starting 0.9.15 (if you're on the 0.9 branch) or -
I believe - the first 1.2 release after August 13th (2007).

HTH,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-12 14:02:54 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.