>>>>> "Bert" == Bert Huijben <rhuijben_at_sharpsvn.net> writes:
Bert> On the Mac and on Windows the filesystem always uses Unicode to
Bert> represent filenames (Mac as UTF-8, Windows as USC-2/UTF-16), so
Bert> the LANG setting only applies to the client IO there and never
Bert> to the paths. On the unixes paths don't have a specific
Bert> encoding -paths consists of bytes-, so the LANG setting applies
Bert> to path names too.
Bert> In this case a file on disk has a path that can't be
Bert> interpreted by the current LANG setting. (E.g. utf-8 works with
Bert> lead and follow bytes for multibyte characters.. if the first
Bert> byte of a multibyte character is a follow-byte it's encoding is
Bert> invalid). This is probably caused by saving files with names in
Bert> one encoding (E.g. ISO-8859-1) and then reading them back with
Bert> another encoding (probably UTF-8).
Another possible problem, which I've run into on the Mac, is that some
characters can be encoded in more than one way in UTF-8. And the Mac
filesystem converts them all to a single preferred encoding. So if
you read back a filename, it may not match what you originally
supplied.
The solution to this is to normalize all Unicode strings. If you ever
need to compare strings, you have to normalize first; if you don't
then it will not work. I'm pretty sure Subversion didn't do this; I
don't know if it does now.
paul
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1579507
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-04-07 18:22:36 CEST