[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Hiroaki Nakamura <hnakamur_at_gmail.com>
Date: Tue, 7 Feb 2012 06:26:54 +0900

2012/2/6 Stefan Sperling <stsp_at_elego.de>:
> On Mon, Feb 06, 2012 at 02:28:40PM +0100, Branko Čibej wrote:
>> On 06.02.2012 14:10, Hiroaki Nakamura wrote:
>> > Hi, all.
>> >
>> > It seems there is no further discussion.
>> >
>> > I think the conclusion for the short term solution is:
>> > We convert unnormalized paths to NFC normalized paths on clients only,
>> > that is, svn_path_cstring_to_utf8.
>> >
>> > It is the same approach as utf8precompose_macosx_2.patch in
>> > http://subversion.tigris.org/issues/show_bug.cgi?id=2464
>> >
>> > It is proven to work as it is included in MacPorts unicode_path variant
>> > and Homebrew --unicode-path option.
>>
>> You'll note that MacPorts also warns you that using this option may
>> cause interoperability issues with other clients that aren't using it,
>> right? So this is hardly a universal solution that will not affect
>> existing users and repositories.
>
> Exactly. This is what I meant when I said that we cannot apply the
> submitted patch as it is, at the very beginning of this thread.
> The submitted patch simply copies the MacPorts solution and has
> the same compatibility problems.
>
> I think the discussion made clear that there are two ways
> to move forward:
>
>  1) Implement a client-side mapping table which maps server-provided
>    paths to local filesystem paths. It translates between one or more
>    server-side and local representations of the same path. This could
>    be done only on Mac OS X (or, preferrably, only on HFS+ filesystems)
>    because only Mac OS X has problems.
>    The idea here is to not change existing paths in repositories at all,
>    no matter which way they are encoded, and to teach Mac OS X clients
>    to cope with the problem locally. This way, other existing clients
>    won't notice a difference. The only thing that won't work is to create
>    a working copy on Mac OS X which contains the same name multiple times,
>    in NFD and in some other normalised or non-normalised form.
>    This approach was suggested by Peter.

The Unicode Standard says canonical equivalent sequences should be
interpreted the same way.
* 1.1 Canonical and Compatibility Equivalence
  http://unicode.org/reports/tr15/#Canonical_Equivalence
* 2.12 Equivalent Sequences and Normalization
  http://www.unicode.org/versions/Unicode6.0.0/ch02.pdf

So we should not have the same name multiple times in repositories
and working copies. Therefore subversion servers and clients does
not need to handle them. Rather I think we should fix subversion to
reject the same name in a different form.

To handle existing repositories and working copies, maybe we should
create a tool which checks repositories and working copies have the
same name multiple times.

If they have, users must rename files manually. In reality, I think this
is extremely rare.

>    We'd need either a working patch or a more detailed implementation
>    design document to move forward here.

OK. Peter, or somebody else, please give us either one of them.

>
>  2) Do something else that effects repositories, too, and provide
>    a clean upgrade path for everyone (servers and clients).
>    AFAIK nobody has made a suggestion as to what could be done here.

What do you mean by a clean upgrade?
Is it clean if we do dump and load for repositories and re-checkout for
working copies?

-- 
)Hiroaki Nakamura) hnakamur_at_gmail.com
Received on 2012-02-06 22:27:27 CET

This is an archived mail posted to the Subversion Dev mailing list.