[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: International Characters & Subversion 1.1.0 Problems

From: Erich Enke <epte_at_ruffdogs.com>
Date: 2004-10-04 22:00:15 CEST

I have a little more information now than previously.

The hex character that svn is complaining is a bad UTF sequence is the
e4. That would make some sense, since the Unicode value for the a with
hysteresis is indeed 0x00e4.

0x00e4 in octal is \344, the one that the en_US commit says is missing.

When manually encoding 0x00e4 into UTF-8, I come up with:
0x00e4 = 000 1110 0100 ==> 110 000 11 10 10 0100 = 0xc3a4 =
\303 \244
with the standard 110 and 10 prefixes.

However, even though `locale charmap` says 'UTF-8', if I do:
echo ab | tr 'a' '\303' | tr 'b' '\244'
I get รค (Cap. A + superscript tilde, and then something that looks
like a misfigured pound sign). That's not right. I should get a
lower-case a with hysteresis, I would think.

I tried checking in a file with that name, but when commiting the merge,
it doesn't recognize it as an a-with-hysteresis, even though I'm pretty
sure I got the octal right. However, now I can't even remove that extra
file! It says:

followed by invalid UTF-8 sequence
(hex: e4 73 74 65)

It seems like I should have enough information to piece together what's
going on if I could just put it all together...

Trying svn remove on (with cosmetic spaces) 'G 0xe4 steBuch' and 'G
0xc3a4 steBuch' both (I can hexdump the contents of the variables I am
using to hold these characters and confirm that I am indeed holding 0xe4
and 0xc3a4) yield the above 'invalid UTF-8 sequence', including the 'e4'
character. So both UTF-16 (I think??) and UTF-8 are being converted to
UTF-16 (?) somewhere along the way, but that UTF-16 (?) char is being
interpreted as UTF-8 (0xe4 is indeed invalid UTF-8), which shouldn't be
happening. This is sounding more and more like a bug to me.

Rats. I was hoping figuring this out would give me ideas for a
workaround. Oh well.

Erich
Ruffdogs.com

Erich Enke wrote:

> Patrick Smears wrote:
>
>> On Mon, 4 Oct 2004, Erich Enke wrote:
>>
>>
>>
>>> Note that any time I have done operations with UTF above, I have done:
>>> export LANG=UTF-8
>>> export LC_CLANG=UTF-8
>>> export LC_CTYPE=UTF-8
>>>
>>
>>
>> On my system at least, I have to set "LANG" etc to something ending
>> in ".utf8", for example....
>>
>> export LANG=en_GB.utf8 # United Kingdom
>> export LANG=de_DE.utf8 # Germany
>>
>>
>>
> Thanks for the hint. I wasn't quite doing that correctly.
>
> However, exporting LANG, LC_CLANG, LC_CTYPE, and LC_ALL as en_US.UTF-8
> gives the same results as I had previously.
>
> Erich
> Ruffdogs.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Oct 4 22:08:10 2004

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.