[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Unversioned files with invalid UTF-8 sequence in name confuse svn

From: Branko Čibej <brane_at_apache.org>
Date: Mon, 29 Feb 2016 19:57:04 +0100

On 29.02.2016 19:30, Vincent Lefevre wrote:
> On 2016-02-29 17:00:01 +0100, Bert Huijben wrote:
>> The problem is most likely not that they have an invalid utf-8 sequence in
>> their name, but that your settings report that filenames are encoded in one
>> way, while there is a file which name can't be expressed by that format.
>>
>> You get this error when Subversion isn't able to convert the filename to its
>> internal utf-8 format, which should be capable to express any valid
>> filename. (If you declare that all filenames are utf-8, there wouldn't be a
>> conversion, so in most cases not an error)
>>
>> To just handle it as unversioned as you suggest we need to at least be able
>> to express its name.
> There are two ways to express a filename:
> 1. The only from the OS (e.g., in POSIX, this is just a sequence
> of bytes).

This isn't entirely correct. It's true as far as most (but certainly not
all) filesystem implementations are concerned; but applications expect
to interpret those bytes in the context of the active locale.

> 2. The one used by Subversion internally.
>
> (2) is necessary for versioned files, but for unversioned files,
> you do not need to do the (1) -> (2) conversion.

Sure you do. How else are you going to know that the file is
unversioned? (The working copy database stores paths encoded as UTF-8.)

...

> The problem is that it is too easy to create files with a name using
> invalid UTF-8 sequences

File names on disk DO NOT have to be represented in UTF-8. They do have
to be represented in consistently with the current locale settings.

A fairly plausible cause for getting the wrong representation is
changing the locale for the duration of a script invocation. Another
plausible way is to create files based on the contents of some script,
which are not encoded the as expected by the current locale.

> (in my case, it seems just to be due to a bug in Automake or Libtool).

Or the way you're using them, perhaps?

> But the user should not be required to find them and delete manually.

It's also too easy to ignore (or delete) files because someone managed
to misconfigure their locale. I'd really, really strongly suggest not to
make such a thing the default in Subversion.

-- Brane
Received on 2016-02-29 19:57:11 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.