[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Locale problem: Can't convert string from native encoding to 'UTF-8'

From: Ryan Schmidt <subversion-2005_at_ryandesign.com>
Date: 2005-09-08 13:15:21 CEST

On Sep 7, 2005, at 17:29, David Kramer wrote:

> Is there a way to convert the dump to UTF-8 (it's all US ASCII,
> AFAIK)?

Well, ASCII is a subset of UTF-8. If you have a collection of ASCII
characters, then you also by definition have a collection of UTF-8
characters. There is no conversion to be done.

So the fact that you have an error signifies that you do not have
exclusively ASCII characters.

> svnadmin: Valid UTF-8 data
> (hex:)
> followed by invalid UTF-8 sequence
> (hex: a0 d1 07 08)

Let's see here.... In the ISO-8859 character sets (all of them), A0
is a non-breaking space. D1 is (in ISO-8859-1, -3 and -9) is a
capital N with tilde ("Ñ") (and in other ISO-8859 sets, it's
undefined). 07 is a bell, and 08 is a backspace. I don't know
svnadmin well enough to know what kind of data it's talking about
here. If that data is the contents of a binary file, then that
sequence of bytes is conceivable, though it certainly won't conform
to UTF-8, so not sure why svnadmin would think it should. If that's
part of a text file's contents, or part of a filename or some
properties, then that's a strange sequence of characters indeed. It
almost sounds like something is corrupted somewhere. Perhaps you can
open the dump file in a good editor (as in one that doesn't have to
load the entire file into memory all at once) and search for the Ñ
and see where it is. If you have binary files in your repository,
then this may be more complicated (will probably give you many false-
positives).

> svn: Can't convert string from native encoding to 'UTF-8':
> @?\217?d?\14?\184?\174

217 is D9 which is a capital U with grave accent ("Ù"). 14 is
apparently the shift-out code, which I had never heard of until now.
184 is B8 which is a cedilla ("¸", the hook that usually appears
below a c). 174 is AE which is the registered trademark symbol ("®").
Same comment and advice as above.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Sep 8 13:17:24 2005

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.