AW: Let's discuss about unicode compositions for filenames!
From: Markus Schaber <m.schaber_at_3s-software.com>
Date: Mon, 30 Jan 2012 16:05:09 +0000
Von: Peter Samuelson [mailto:peter_at_p12n.org]
> There are similarities, but there are some important differences:
>- We have to support Mac OS X, which stores all files in NFD. In the
The preservation of cases does not help that much - a simple "map all to lower case when accessing the working copy, and search case insensitive in the database" could solve that problem - but there's the problem that the repository can contain files whose filename differs only in case, and then the preserving of original case does not help that much either.
>- Also, the Subversion platform has chosen to support files like README
> Because of those differences, my gut feeling is that we can't treat the two issues in the same way.
There seem to be clients which allow files whose name differs only by encoding. So the position of "unicode encoding collisions" could be the same than on "case insensitivity collisions " (allow in the repository what the most capable clients allow). My guess is that the fixes for that scenario are rather similar (mainly client-based, specific to the capabilities of the platform, and "if you have users on mac, don't do that"). Of course Mac clients internally need to map to their normalized encoding in a similar way as it is done for case sensitivity now, and in case of encoding collisions, they've lost (similar to case collisions on Mac and Windows).
If the position is to disallow files whose name only differs by encoding in the repositories, things are a little bit different.
But I think that even this can be solved purely on the client, by only sending normalized names to the server for all new objects (imports, additions, copy targets, ...), and using the existing encodings for all existing objects.
For existing collisions, which harm work on MacOS, the usual workarounds apply: Rename the colliding files via repo-browser or in a more capable client. Additionally, we could develop a dump filter tool for name normalization, maybe with a switch whether to error out or silently rename on collisions.
With proper documentation, this will cause the problem to fade out in the future, and - in theory - it can be implemented on top of the first one at a later time. I don't see any need to change anything on the server (both implicit conversion and rejection of invalid encodings would break existing clients and working copies). My personal guess is that actual encoding collisions are rather rare, and workarounds exist, so servers can start to reject invalid encodings with version 2.0, or whatever future version is allowed to break compatibility to old clients.
-- ___________________________ We software Automation. 3S-Smart Software Solutions GmbH Markus Schaber | Developer Memminger Str. 151 | 87439 Kempten | Germany | Tel. +49-831-54031-0 | Fax +49-831-54031-50 Email: email@example.com | Web: http://www.3s-software.com CoDeSys internet forum: http://forum.3s-software.com Download CoDeSys sample projects: http://www.3s-software.com/index.shtml?sample_projects Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915Received on 2012-01-30 17:05:49 CET
This is an archived mail posted to the Subversion Dev mailing list.