[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 support on Mac?

From: Ryan Schmidt <subversion-2007b_at_ryandesign.com>
Date: Wed, 6 Feb 2008 16:30:15 -0600

On Feb 6, 2008, at 04:25, B. Blodau wrote:

> Am 06.02.2008 um 02:17 schrieb Ryan Schmidt:
>
>> On Feb 5, 2008, at 10:59, B. Blodau wrote:
>>
>>> I'm encountering a problem when calling svn_client_commit3() on
>>> the Mac.
>>>
>>> The name of the file to be committed contains a non-ASCII
>>> character, in this case an 'ä' (a-umlaut). This is perfectly
>>> encoded to UTF-8 but not as a precomposed character (0xc3, 0xa4)
>>> but as a normalized character consisting of the base character
>>> 'a' plus a following combining character '¨' (COMBINING DIAERESIS).
>>> So the UTF-8 byte sequence is: 0x61, 0xcc, 0x88.
>>>
>>> When calling svn_client_commit3(), I'm getting the error message:
>>> "Can't convert string from 'UTF-8' to native encoding." from
>>> "subversion/libsvn_subr/utf.c".
>>>
>>> I'm a bit irritated because the documentation for
>>> svn_client_commit3() says:
>>> "targets is an array of const char* paths to commit. They need
>>> not be canonicalized nor condensed; this function will take care
>>> of that."
>>>
>>> Does svn support non-ASCII characters on the Mac?
>>> Does svn support non-ASCII characters in their normalized form on
>>> the Mac?
>>> Can I do anything to get this working? ;)
>>>
>>> Since other clients can handle such an umlaut, it might be that
>>> svn expects precomposed characters?
>>
>> I'm not sure about the error message you encountered specifically.
>> However there is a problem that Subversion seems not canonicalize
>> the UTF-8 representation of file names in any way. It just stores
>> the UTF-8 bytes the way they come in from the client. On other
>> operating systems this seems not to be a problem because they
>> don't care whether the UTF-8 sequences are composed or decomposed,
>> but because Mac OS X (or rather HFS+) does canonicalize UTF-8
>> representations to the decomposed form, and because other
>> operating systems seem to make filenames in composed form, Mac
>> users using Subversion repositories containing non-ASCII
>> characters tend to run into this problem:
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2464
>
> Thanks a lot for your answer,
> but precomposing the characters did not help. :(
>
> This problem also occurs with non-ASCII characters which are
> usually not normalized (which can't be normalized) like ideographs
> or the german sz 'ß'.
> So for me the current questions are:
> What is the "native encoding" mentioned in the error message?
> Why is svn trying to convert it into this "native encoding" instead
> of leaving it as UTF8 which whould cover all the characters in use
> today.
>
> BTW: This error message occurs before my commit-callback is getting
> called. So I assume this issue is independent from the
> international settings/capabilities of the repository server,
> because no server access seems to happen at that point.
>
> Since other clients can do it, there must be a way.... :)

It sounds like you're calling the Subversion libraries, with which I
have no experience. But on the command line, you need to make sure
that your LANG environment variable is set to something reasonable
for your system, or else Subversion won't know how to convert its
internal UTF-8 data into something your shell can understand.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: users-help_at_subversion.tigris.org
Received on 2008-02-06 23:30:52 CET

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.