[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Let's discuss about unicode compositions for filenames!

From: Hiroaki Nakamura <hnakamur_at_gmail.com>
Date: Sat, 4 Feb 2012 20:08:51 +0900

2012/2/3 Julian Foad <julianfoad_at_btopenworld.com>:
> You may well be correct that NFC is never longer than NFD, but that's not the question.  The question is whether NFC may be longer than the current paths (which are not normalized to normalization form C or to form D).  And the answer is yes it may be longer.  See <http://unicode.org/faq/normalization.html#11>.

Oh, I didn't know that. Thanks for letting me know.
I also read all other items in <http://unicode.org/faq/normalization.html#11>
and all of <http://www.unicode.org/reports/tr15/> and learned more about
normalization.

Maybe we should revise the note.
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames

>
>
>> Here I quote from
>> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
>>   > The proposed internal 'normal form' should be NFC, if only if
>>   > it were because it's the most compact form of the two:  when
>>   > allocating memory to store a conversion result, it won't be
>>   > necessary (ever) to allocate more than the size of the input buffer.
>
> That statement seems to be talking about converting between NFC and NFD, not from un-normalized to normalized.

Yes, indeed.

So, we need to normalize input paths before processing.
We choose NFC as normalization form.

-- 
)Hiroaki Nakamura) hnakamur_at_gmail.com
Received on 2012-02-04 12:09:27 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.