[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 support for Unix with APR?

From: Erik Huelsmann <ehuels_at_gmail.com>
Date: Tue, 12 Feb 2008 13:52:41 +0100

On 2/12/08, B. Blodau <b_blodau_at_hamburg.de> wrote:
> I wrote a similar request for this topic earlier, but now it becomes
> a more general issue.
>
> My questions are:
> - Can subversion support utf8 filenames on Unix systems when using
> the apr libraries?
> - Has anybody used the C-Libraries on a unix system (including MacOS
> X) and successfully used international pathnames?
>
> I'm writing a C++ appliation which uses the svn libraries and
> therefore inherits the apr libraries too.
> When working with international filenames I'm getting errors that
> characters could not be converted from utf8 to the local encoding.
>
> Since my app is a Unicode safe application I don't want the filenames
> be converted, because I want to keep the whole Unicode character set.
> Even a successful conversion to the encoding of the current user
> locale, would result in a limited character set.
>
> When debugging this a bit futher I came to the follwoing code snippet in
> ".../apr/file_io/unix/filepath.c":
>
> APR_DECLARE(apr_status_t) apr_filepath_encoding(int *style,
> apr_pool_t *p)
> {
> *style = APR_FILEPATH_ENCODING_LOCALE;
> return APR_SUCCESS;
> }
>
> This looks as if - at least for Unix - no utf8 support is intended.
> Otherwise this function should return APR_FILEPATH_ENCODING_UTF8.

The APR libraries handle file paths in the system locale. This means
they *may* be encoded in UTF-8, but are not necessarily. When they are
interpreted as UTF-8 depends on the LANG or LC_CTYPE settings in the
host environment.

LANG=en_US.UTF-8

will indicate UTF-8 pathnames. OTOH,

LANG=en_US.iso8859-1

will indicate "latin1" pathnames. The application is free to do with
that information whatever it wants. Subversion uses the returned value
to determine whether there's any "locale"->"UTF8" conversion
necessary, since internally it entirely uses UTF8 encoded pathnames.

> Can anybody confirm my concern? I just don't want to search for a
> solution where I don't have any chance.

If you encounter conversion problems, chances are you didn't provide
any information to your application regarding the locale settings to
be used: only when you do that, then will Subversion know what the
source encoding to be used is (and it won't convert if the source is
already considered UTF8). Did you call setlocale() (the C library
function)?

bye,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-12 13:53:00 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.