[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Fri, 3 Feb 2012 14:02:18 +0000 (GMT)

Hiroaki Nakamura wrote:

>>> It would be nice if we could normalize paths in the repository without
>>> having to perform a dump/reload cycle, but I don't know how that
>>> would work in FSFS.
>>
>> It won't.  Changing the encoding increase the length (in bytes) of the
>> string (in the dirents hash, for example), and thus change the offsets
>> of the node-revs that are later in the file --- to which subsequent
>> revisions, and the id's of those node-revs, refer.
>
> Changes from NFD to NFC does not increase the length.
> The length will be same or smaller, not larger.

You may well be correct that NFC is never longer than NFD, but that's not the question.  The question is whether NFC may be longer than the current paths (which are not normalized to normalization form C or to form D).  And the answer is yes it may be longer.  See <http://unicode.org/faq/normalization.html#11>.

> Here I quote from
> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
>   > The proposed internal 'normal form' should be NFC, if only if
>   > it were because it's the most compact form of the two:  when
>   > allocating memory to store a conversion result, it won't be
>   > necessary (ever) to allocate more than the size of the input buffer.

That statement seems to be talking about converting between NFC and NFD, not from un-normalized to normalized.

- Julian
Received on 2012-02-03 15:02:54 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.