[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Can't recode string (locale problem ?)

From: Dimitri Papadopoulos-Orfanos <papadopo_at_shfj.cea.fr>
Date: 2004-06-21 12:10:56 CEST

Hi,

>>>It's a known issue, I don't know if it qualifies as a bug or a feature.
>>>The repository seems to require all filenames to be UTF-8-encoded.
>>>However if I recall correctly, under GNU/Linux filenames are
>>>UTF-8-encoded only under an UTF-8 locale such as fr_CH.UTF-8.
>
>
> This is somewhat misleading. Yes, the repository keeps filenames in UTF-8,
> so does the client internally. This has nothing to do with the encoding of
> the local filesystem.

I really meant what I said. My comment wasn't misleading. It was maybe
wrong, but not misleading.

First, yes, the repository and the client keep filenames in UTF-8
internally. This is internal to Subversion and will obviously work
whatever the locale.

Secondly, depending on the platform and the locale, the file system may
use UTF-8 or some other locale-dependent encoding.

In any case, when accessing filenames, the system API may return UTF-8
or other locale-dependent-encoded filenames. As far as I can understand,
apr is supposed to decode these filenames and the apr API should always
return UTF-8-encoded filenames, whatever the locale. See for example:
        APR treats all resource identifiers (files, etc) by their UTF-8
        name, to provide access to all named identifiers.
http://docx.webperf.org/win32_2apr__arch__file__io_8h-source.html

Now what I meant to say is that in some cases apr may not handle the
conversion to UTF-8 correctly. It will handle the (non-)conversion under
UTF-8-encoding locales correctly, but it may fail under some
non-UTF-8-encoding locales. This answer is based on previous threads on
this mailing list. I can't find those threads anymore since the archive
is completely broken, sorry. Now maybe the information in those threads
is wrong, or obsolete.

In this case, it seems there may be some problem when recoding filenames
as returned by the system API (locale-dependent encoding) to internal
Subversion encoding (UTF-8 encoding). Or maybe not: the filenames may be
plain wrong in the first place. I'll reply to a later message in this
thread in order to further investigate that.

Dimitri

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Jun 21 12:33:03 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.