[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Locale problem: Can't convert string from native encoding to 'UTF-8'

From: Ryan Schmidt <subversion-2005_at_ryandesign.com>
Date: 2005-09-08 13:15:21 CEST

On Sep 7, 2005, at 17:29, David Kramer wrote:

> Is there a way to convert the dump to UTF-8 (it's all US ASCII,
> AFAIK)?

Well, ASCII is a subset of UTF-8. If you have a collection of ASCII
characters, then you also by definition have a collection of UTF-8
characters. There is no conversion to be done.

So the fact that you have an error signifies that you do not have
exclusively ASCII characters.

> svnadmin: Valid UTF-8 data
> (hex:)
> followed by invalid UTF-8 sequence
> (hex: a0 d1 07 08)

Let's see here.... In the ISO-8859 character sets (all of them), A0
is a non-breaking space. D1 is (in ISO-8859-1, -3 and -9) is a
capital N with tilde ("") (and in other ISO-8859 sets, it's
undefined). 07 is a bell, and 08 is a backspace. I don't know
svnadmin well enough to know what kind of data it's talking about
here. If that data is the contents of a binary file, then that
sequence of bytes is conceivable, though it certainly won't conform
to UTF-8, so not sure why svnadmin would think it should. If that's
part of a text file's contents, or part of a filename or some
properties, then that's a strange sequence of characters indeed. It
almost sounds like something is corrupted somewhere. Perhaps you can
open the dump file in a good editor (as in one that doesn't have to
load the entire file into memory all at once) and search for the
and see where it is. If you have binary files in your repository,
then this may be more complicated (will probably give you many false-
positives).

> svn: Can't convert string from native encoding to 'UTF-8':
> @?\217?d?\14?\184?\174

217 is D9 which is a capital U with grave accent (""). 14 is
apparently the shift-out code, which I had never heard of until now.
184 is B8 which is a cedilla ("", the hook that usually appears
below a c). 174 is AE which is the registered trademark symbol ("").
Same comment and advice as above.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org
Received on Thu Sep 8 13:17:24 2005

This is an archived mail posted to the Subversion Users mailing list.