[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: [Error conversion UTF-8]

From: Paul Koning <Paul_Koning_at_dell.com>
Date: Tue, 7 Apr 2009 12:21:36 -0400

>>>>> "Bert" == Bert Huijben <rhuijben_at_sharpsvn.net> writes:

 Bert> On the Mac and on Windows the filesystem always uses Unicode to
 Bert> represent filenames (Mac as UTF-8, Windows as USC-2/UTF-16), so
 Bert> the LANG setting only applies to the client IO there and never
 Bert> to the paths. On the unixes paths don't have a specific
 Bert> encoding -paths consists of bytes-, so the LANG setting applies
 Bert> to path names too.

 Bert> In this case a file on disk has a path that can't be
 Bert> interpreted by the current LANG setting. (E.g. utf-8 works with
 Bert> lead and follow bytes for multibyte characters.. if the first
 Bert> byte of a multibyte character is a follow-byte it's encoding is
 Bert> invalid). This is probably caused by saving files with names in
 Bert> one encoding (E.g. ISO-8859-1) and then reading them back with
 Bert> another encoding (probably UTF-8).

Another possible problem, which I've run into on the Mac, is that some
characters can be encoded in more than one way in UTF-8. And the Mac
filesystem converts them all to a single preferred encoding. So if
you read back a filename, it may not match what you originally
supplied.

The solution to this is to normalize all Unicode strings. If you ever
need to compare strings, you have to normalize first; if you don't
then it will not work. I'm pretty sure Subversion didn't do this; I
don't know if it does now.

      paul

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1579507

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-04-07 18:22:36 CEST

This is an archived mail posted to the Subversion Users mailing list.