[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Branko ─îibej <brane_at_apache.org>
Date: Mon, 30 Jan 2012 13:47:38 +0100

On 30.01.2012 13:30, Stefan Sperling wrote:
> On Sun, Jan 29, 2012 at 07:38:44PM +0900, Hiroaki Nakamura wrote:
>> Hi folks!
>> I read the note about unicode compositions for filenames
>> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
>> and would like to drive the discussion.
> Hi,
> I am very happy to hear that you want to work towards getting this
> problem fixed. Thank you for your help!
> I've just re-read the unicode-composition-for-filenames notes.
> I think they are a bit outdated. For instance, they still talk about
> the 1.6 working copy format. They also don't clearly explain the problems
> with backwards compatibility we're facing here.


We have to track two distinct normalizations, the internal (wc.db,
repos) form, most likely NFC, and the working copy, on-disk form. This
last will depend on the host system; most likely NFD on Mac OS and NFC
everywhere else. The on-disk normalization needs to happen before
conversion to the system encoding, of course.

libsvn_repos should do its own normalization to NFC because we can't
trust old clients to do it right.
Doing a dump/reload cycle should then be sufficient to upgrade the
repository, and probably the only viable one, too.

For working copies, we may want to teach "svn upgrade" to do the on-disk
and wc.db normalization dance. Clearly, client-side normalization
requires a WC format bump, but it need not be automatic.

We should probably give serious thought to using the restricted
normalisation forms (NFKC and NFKD) and tell people who want proper
Unicode Roman numerals in their file names to think again. :)

-- Brane
Received on 2012-01-30 13:48:16 CET

This is an archived mail posted to the Subversion Dev mailing list.