[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Thomas ┼kesson <thomas_at_akesson.cc>
Date: Sun, 12 Feb 2012 16:47:45 +0100

On 11 feb 2012, at 13:10, Hiroaki Nakamura wrote:

> Hi,
>
> 2012/2/9 Thomas ┼kesson <thomas_at_akesson.cc>:
>> Hi,
>> I have been interested in this issue for a couple of years and I remember it was discussed briefly at Subconf in Germany a couple of years ago.
>>
>> Branching the thread here because I'd like to propose a different approach than Hiroaki. This proposition is not very different from the note "unicode-composition-for-filenames" or what Peter S, Neels and others suggested, perhaps just combining 2 changes slightly differently.
>>
>> This is based on my limited understanding of WC-NG, please correct me if I make incorrect assumptions.
>>
>> - Server will still accept both NFC and NFD, however, it will no longer accept collisions. Enforced by normalising to NFD before uniqueness checks during add operations (yes, might be more expensive). There will be no unified normalisation, but the subversion server will work like most filesystems; return what was given to it.
>
> For compatibility, we cannot ignore existing repositories and working
> copies which have filename
> collisions. So we cannot enforce subversion servers and clients to
> normalize filenames.
> We must let users to choose whether filenames are normalized or not
> per repository.
>

Perhaps I did not describe this well enough, but I am _not_ suggesting a normalized repository storage, just normalized uniqueness check during add operations. I believe that a normalized repository storage would cause too much compatibility issues with historical data (as well as other negative effects noted below).

The proposition I outlined has _no_ issues what so ever with existing repositories or working copies, even if they do have name collisions (which we all agree is rare). What would change is the ability to create _new_ name collisions (normalized) while old name collisions could be resolved with 'svn mv'.

I am not sure anyone has yet voiced the opinion that Subversion must continue to accept the creation of new name collisions. Anyone? I think Neels was closest to that opinion that but my interpretation is that he suggested that a Subversion server should not normalize. The more times I read Neels' post (2012-01-30), it is increasingly obvious that what I proposed is very similar.

There is consensus that a high priority for Subversion is compatibility. Introducing a normalization/translation/etc is risky business for compatibility. The HFS+ file system has been chastised (both here and other dev-lists) for its behaviour. A file system is expected to return exactly what was stored, or refuse up-front.

Would it make sense to formalize the different approaches into a couple of RFCs attempting to summarize the respective implications of each approach? I could try to write one up for the "Non-normalizing approach".

/Thomas ┼.
Received on 2012-02-12 16:48:20 CET

This is an archived mail posted to the Subversion Dev mailing list.