I have a little more information now than previously.
The hex character that svn is complaining is a bad UTF sequence is the
e4. That would make some sense, since the Unicode value for the a with
hysteresis is indeed 0x00e4.
0x00e4 in octal is \344, the one that the en_US commit says is missing.
When manually encoding 0x00e4 into UTF-8, I come up with:
0x00e4 = 000 1110 0100 ==> 110 000 11 10 10 0100 = 0xc3a4 =
\303 \244
with the standard 110 and 10 prefixes.
However, even though `locale charmap` says 'UTF-8', if I do:
echo ab | tr 'a' '\303' | tr 'b' '\244'
I get รค (Cap. A + superscript tilde, and then something that looks
like a misfigured pound sign). That's not right. I should get a
lower-case a with hysteresis, I would think.
I tried checking in a file with that name, but when commiting the merge,
it doesn't recognize it as an a-with-hysteresis, even though I'm pretty
sure I got the octal right. However, now I can't even remove that extra
file! It says:
followed by invalid UTF-8 sequence
(hex: e4 73 74 65)
It seems like I should have enough information to piece together what's
going on if I could just put it all together...
Trying svn remove on (with cosmetic spaces) 'G 0xe4 steBuch' and 'G
0xc3a4 steBuch' both (I can hexdump the contents of the variables I am
using to hold these characters and confirm that I am indeed holding 0xe4
and 0xc3a4) yield the above 'invalid UTF-8 sequence', including the 'e4'
character. So both UTF-16 (I think??) and UTF-8 are being converted to
UTF-16 (?) somewhere along the way, but that UTF-16 (?) char is being
interpreted as UTF-8 (0xe4 is indeed invalid UTF-8), which shouldn't be
happening. This is sounding more and more like a bug to me.
Rats. I was hoping figuring this out would give me ideas for a
workaround. Oh well.
Erich
Ruffdogs.com
Erich Enke wrote:
> Patrick Smears wrote:
>
>> On Mon, 4 Oct 2004, Erich Enke wrote:
>>
>>
>>
>>> Note that any time I have done operations with UTF above, I have done:
>>> export LANG=UTF-8
>>> export LC_CLANG=UTF-8
>>> export LC_CTYPE=UTF-8
>>>
>>
>>
>> On my system at least, I have to set "LANG" etc to something ending
>> in ".utf8", for example....
>>
>> export LANG=en_GB.utf8 # United Kingdom
>> export LANG=de_DE.utf8 # Germany
>>
>>
>>
> Thanks for the hint. I wasn't quite doing that correctly.
>
> However, exporting LANG, LC_CLANG, LC_CTYPE, and LC_ALL as en_US.UTF-8
> gives the same results as I had previously.
>
> Erich
> Ruffdogs.com
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Mon Oct 4 22:08:10 2004