[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svnadmin: Path '....' is not in UTF-8 - svnadmin load fails

From: Stefan Sperling <stsp_at_elego.de>
Date: Tue, 31 May 2011 01:07:02 +0200

On Tue, May 31, 2011 at 01:41:54AM +0300, Daniel Shahaf wrote:
> How would you handle a repository that contains the following
> nodes/fspaths:
>
> /foo/bår (in UTF-8)
> /foo/bår (in latin1)
>
> ?
>
>
> How would you handle a repository that contains:
> /foo/barÉ (in latin1)
> /foo/barŠ (in latin2)
>
> ?

All the ISO-8859 (latin) encodings are single-byte encodings.
It's not possible to know what the encoding is supposed to be if
paths in different ISO-8859 encodings entered the repository.
They all decode to different but valid strings of characters.

In the first iteration of this feature I would simply assume one
user-specified source encoding and try to convert data that isn't
UTF-8 from the source encoding to UTF-8.
In case multiple single-byte encodings are present this means that some
characters will be wrong but the repository will work again without
manual intervention. In case multiple multi-byte encodings other than
UTF-8 are present this approach can fail and might require manual fixing
(no worse than the current situation).
This could still be improved upon if necessary.
 
> > We should also make svnadmin verify complain if paths are not in UTF-8.
>
> +1.
>
> The validation that 'load' and 'commit' trigger is path_valid() in
> fs_loader.c.

Thanks for the hint. I'm now running tests on a patch for this.
Received on 2011-05-31 01:07:42 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.