Re: [RFC] - Proper encoding for patch file?

From: Branko Čibej <brane_at_xbc.nu>
Date: Sun, 02 Oct 2011 21:03:20 +0200

On 08.09.2011 20:07, Mark Phippard wrote:
> This is a JavaHL issue. See the attached patch which resolves the
> problem I face.
>
> If I use the JavaHL diff API to produce a patch it fails if there are
> paths in the patch with UTF8 characters in the name. Here is an
> example of the Exception:
>
> Invalid argument
> svn: Can't convert string from 'UTF-8' to native encoding:
> svn: Index: ?\230?\181?\139?\232?\175?\149?\230?\150?\135?\228?\187?\182.txt
> ===================================================================
>
> RA layer request failed
> svn: Error reading spooled REPORT request response
>
>
> The problem seems to be that JavaHL creates the output file for the
> patch with the encoding of SVN_APR_LOCALE_CHARSET. If I change this
> to "utf-8" as shown in the patch then the method works.
>
> The command line client from the same system works fine.
>
> How do people feel about this? Does it make sense that JavaHL should
> create the patch file with UTF-8 encoding? I tend to think it does,
> but thought I would raise the question here.
>

Unfortunately, on Linux (and other *ix), the filename encoding is just a
convention. So there's no guarantee that the filename is in fact UTF-8,
even if the locale says it should be. Therefore, just writing the file
names to the patch file unchanged ("in UTF-8") will not in fact do the
right thing in exactly the kind of corner case that's triggering this error.

The only marginally sane solution is to include complete Unicode
normalization and transliteration libraries in Subversion ... and use
them correctly. I expect that'd mean storing the actual transliterated
filename in the WC datbase alongside the original UTF-8 value that came
from the repository, because transliteration is in general not reversible.

-- Brane

P.S.: As an added bonus, that would allow us to "transliterate"
characters that are invalid on some particular filesystem, if they
happen to appear in names in the repository.
Received on 2011-10-02 21:03:59 CEST

This message: [ Message body ]
Next message: Branko Čibej: "Re: SQL indices a WC format bump and 1.7"
Previous message: Daniel Shahaf: "Re: Fwd: [Daniel Shahaf: Long-standing corruption on svn.apache.org]"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]