[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 support on Mac?

From: B. Blodau <b_blodau_at_hamburg.de>
Date: Tue, 12 Feb 2008 10:38:57 +0100

Am 06.02.2008 um 23:30 schrieb Ryan Schmidt:

>
> On Feb 6, 2008, at 04:25, B. Blodau wrote:
>
>> Am 06.02.2008 um 02:17 schrieb Ryan Schmidt:
>>
>>> On Feb 5, 2008, at 10:59, B. Blodau wrote:
>>>
>>>> I'm encountering a problem when calling svn_client_commit3() on
>>>> the Mac.
>>>>
>>>> The name of the file to be committed contains a non-ASCII
>>>> character, in this case an 'ä' (a-umlaut). This is perfectly
>>>> encoded to UTF-8 but not as a precomposed character (0xc3, 0xa4)
>>>> but as a normalized character consisting of the base character
>>>> 'a' plus a following combining character '¨' (COMBINING DIAERESIS).
>>>> So the UTF-8 byte sequence is: 0x61, 0xcc, 0x88.
>>>>
>>>> When calling svn_client_commit3(), I'm getting the error message:
>>>> "Can't convert string from 'UTF-8' to native encoding." from
>>>> "subversion/libsvn_subr/utf.c".
>>>>
>>>> I'm a bit irritated because the documentation for
>>>> svn_client_commit3() says:
>>>> "targets is an array of const char* paths to commit. They need
>>>> not be canonicalized nor condensed; this function will take care
>>>> of that."
>>>>
>>>> Does svn support non-ASCII characters on the Mac?
>>>> Does svn support non-ASCII characters in their normalized form
>>>> on the Mac?
>>>> Can I do anything to get this working? ;)
>>>>
>>>> Since other clients can handle such an umlaut, it might be that
>>>> svn expects precomposed characters?
>>>
>>> I'm not sure about the error message you encountered
>>> specifically. However there is a problem that Subversion seems
>>> not canonicalize the UTF-8 representation of file names in any
>>> way. It just stores the UTF-8 bytes the way they come in from the
>>> client. On other operating systems this seems not to be a problem
>>> because they don't care whether the UTF-8 sequences are composed
>>> or decomposed, but because Mac OS X (or rather HFS+) does
>>> canonicalize UTF-8 representations to the decomposed form, and
>>> because other operating systems seem to make filenames in
>>> composed form, Mac users using Subversion repositories containing
>>> non-ASCII characters tend to run into this problem:
>>>
>>> http://subversion.tigris.org/issues/show_bug.cgi?id=2464
>>
>> Thanks a lot for your answer,
>> but precomposing the characters did not help. :(
>>
>> This problem also occurs with non-ASCII characters which are
>> usually not normalized (which can't be normalized) like ideographs
>> or the german sz 'ß'.
>> So for me the current questions are:
>> What is the "native encoding" mentioned in the error message?
>> Why is svn trying to convert it into this "native encoding"
>> instead of leaving it as UTF8 which whould cover all the
>> characters in use today.
>>
>> BTW: This error message occurs before my commit-callback is
>> getting called. So I assume this issue is independent from the
>> international settings/capabilities of the repository server,
>> because no server access seems to happen at that point.
>>
>> Since other clients can do it, there must be a way.... :)
>
> It sounds like you're calling the Subversion libraries, with which
> I have no experience. But on the command line, you need to make
> sure that your LANG environment variable is set to something
> reasonable for your system, or else Subversion won't know how to
> convert its internal UTF-8 data into something your shell can
> understand.
>

Yes,
I'm calling the libraries directly and since my app is Unicode aware,
I would like to avoid any conversion to something else than UTF8 or
UTF16, because this would limit the number of valid characters.

I investigated this a bit further and finally I found the following
code snippet in "filepath.c" of the apr libs for unix:

APR_DECLARE(apr_status_t) apr_filepath_encoding(int *style,
apr_pool_t *p)
{
     *style = APR_FILEPATH_ENCODING_LOCALE;
     return APR_SUCCESS;
}

It seems that apr is not intended to support UTF8 for unix, otherwise
it would not be hard coded to always return
APR_FILEPATH_ENCODING_LOCALE instead of APR_FILEPATH_ENCODING_UTF8.

So the question is now: Can subversion ever support utf8 filenames on
unix, if it is based on apr?

Thanks to everybody who tried to help.
Bert

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-12 10:39:21 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.