[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Hiroaki Nakamura <hnakamur_at_gmail.com>
Date: Tue, 7 Feb 2012 06:26:54 +0900

2012/2/6 Stefan Sperling <stsp_at_elego.de>:
> On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote:
>> On 06.02.2012 14:10, Hiroaki Nakamura wrote:
>> > Hi, all.
>> >
>> > It seems there is no further discussion.
>> >
>> > I think the conclusion for the short term solution is:
>> > We convert unnormalized paths to NFC normalized paths on clients only,
>> > that is, svn_path_cstring_to_utf8.
>> >
>> > It is the same approach as utf8precompose_macosx_2.patch in
>> > http://subversion.tigris.org/issues/show_bug.cgi?id=2464
>> >
>> > It is proven to work as it is included in MacPorts unicode_path variant
>> > and Homebrew --unicode-path option.
>> You'll note that MacPorts also warns you that using this option may
>> cause interoperability issues with other clients that aren't using it,
>> right? So this is hardly a universal solution that will not affect
>> existing users and repositories.
> Exactly. This is what I meant when I said that we cannot apply the
> submitted patch as it is, at the very beginning of this thread.
> The submitted patch simply copies the MacPorts solution and has
> the same compatibility problems.
> I think the discussion made clear that there are two ways
> to move forward:
>  1) Implement a client-side mapping table which maps server-provided
>    paths to local filesystem paths. It translates between one or more
>    server-side and local representations of the same path. This could
>    be done only on Mac OS X (or, preferrably, only on HFS+ filesystems)
>    because only Mac OS X has problems.
>    The idea here is to not change existing paths in repositories at all,
>    no matter which way they are encoded, and to teach Mac OS X clients
>    to cope with the problem locally. This way, other existing clients
>    won't notice a difference. The only thing that won't work is to create
>    a working copy on Mac OS X which contains the same name multiple times,
>    in NFD and in some other normalised or non-normalised form.
>    This approach was suggested by Peter.

The Unicode Standard says canonical equivalent sequences should be
interpreted the same way.
* 1.1 Canonical and Compatibility Equivalence
* 2.12 Equivalent Sequences and Normalization

So we should not have the same name multiple times in repositories
and working copies. Therefore subversion servers and clients does
not need to handle them. Rather I think we should fix subversion to
reject the same name in a different form.

To handle existing repositories and working copies, maybe we should
create a tool which checks repositories and working copies have the
same name multiple times.

If they have, users must rename files manually. In reality, I think this
is extremely rare.

>    We'd need either a working patch or a more detailed implementation
>    design document to move forward here.

OK. Peter, or somebody else, please give us either one of them.

>  2) Do something else that effects repositories, too, and provide
>    a clean upgrade path for everyone (servers and clients).
>    AFAIK nobody has made a suggestion as to what could be done here.

What do you mean by a clean upgrade?
Is it clean if we do dump and load for repositories and re-checkout for
working copies?

)Hiroaki Nakamura) hnakamur_at_gmail.com
Received on 2012-02-06 22:27:27 CET

This is an archived mail posted to the Subversion Dev mailing list.