[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Neels J Hofmeyr <neels_at_elego.de>
Date: Mon, 30 Jan 2012 17:10:52 +0100

On 01/30/2012 02:00 PM, Markus Schaber wrote:
> Maybe the best solution to this issue is a client-only solution, in a similar way the case sensitivity problem is tackled.

Spinning the client-only thought a bit: Imagine a repos with a un*x user
adding a file called "föö". Now an OSX user checks it out and gets the path
normalized to "fo:o:".

1. wc.db on OSX's HFS+ file systems has to be aware that the file "föö" is
stored locally as "fo:o:".

2. Whenever the OSX user types in "fo:o:", the client must remember that the
repos expects the path for this node to be sent as "föö", or the repos will
reply that the node does not exist. It could be solved with a translation
table between the repos and the client, but it remains quite a messy
endeavor, because:

3. New files may be added remotely at any given moment. For example, a path
'föö/bar' is checked out to OSX's fs and becomes 'fo:o:/bar'. Then someone
else adds 'fo:o:/bar' to the repos as well -- we now have two distinct 'bar'
files in the repos that share the same normalized path. Now OSX potentially
mistakes its checked-out 'föö/bar' for the later added 'fo:o:/bar', as that
matches the local path without any de-normalisations... The OSX client
basically has no chance to show "conflicting" files to its user
simultaneously. Data is "hidden".

Thus, OSX admins will want the repository to be able to disallow having
multiple representations of the same normalized path -- not that easy to
achieve, in fact: before accepting a path name from the client, the repos
needs to either cycle through all possible unicode representations or needs
to normalize and compare all existing paths. Normalizing a client's path
before storing in the repos is a no-go, as the client won't be able find its
nodes later. Probably the best option is to define a given normalization per
repos and then refuse commits that add non-normalized paths, like a
pre-commit hook.

On the other hand, an all-un*x shop must be allowed to operate the way they
always did. Their OSs only see byte sequences and don't mess around with
normalization. Say they want to have a folder of differently normalized
representations of the same file for testing *their own* code for unicode
robustness. They should be able to do that. (They obviously can't use OSX's
HFS+ for that, though.)

So, on top of client-only fixes, it would be good to have ways to enforce
certain repository behavior, based on self-imposed policy -- I mean, we
won't have "The Subversion Normalization", each admin decides alone.

On 01/30/2012 01:30 PM, Stefan Sperling wrote:
> I am not convinced that it is impossible to fix.

Nicely put :)


fred@mac $ svn co http://svn/repos
A foo
A bar
*** Warning:
You are checking out to an HFS+ file system. Your WC may not accurately
represent this revision. Consider using a different file system!
Continue? (Y/n) Y
A föö
*** File name collision detected. Skipping 'föo:'
*** File name collision detected. Skipping 'fo:ö'
*** File name collision detected. Skipping 'fo:o:'
A baz
fred_at_mac $

Received on 2012-01-30 17:11:34 CET

This is an archived mail posted to the Subversion Dev mailing list.