Re: Let's discuss about unicode compositions for filenames!

From: Hiroaki Nakamura <hnakamur_at_gmail.com>
Date: Sat, 4 Feb 2012 20:08:51 +0900

2012/2/3 Julian Foad <julianfoad_at_btopenworld.com>:
> You may well be correct that NFC is never longer than NFD, but that's not the question. The question is whether NFC may be longer than the current paths (which are not normalized to normalization form C or to form D). And the answer is yes it may be longer. See <http://unicode.org/faq/normalization.html#11>.

Oh, I didn't know that. Thanks for letting me know.
I also read all other items in <http://unicode.org/faq/normalization.html#11>
and all of <http://www.unicode.org/reports/tr15/> and learned more about
normalization.

Maybe we should revise the note.
http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames

>
>
>> Here I quote from
>> http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames
>> > The proposed internal 'normal form' should be NFC, if only if
>> > it were because it's the most compact form of the two: when
>> > allocating memory to store a conversion result, it won't be
>> > necessary (ever) to allocate more than the size of the input buffer.
>
> That statement seems to be talking about converting between NFC and NFD, not from un-normalized to normalized.

Yes, indeed.

So, we need to normalize input paths before processing.
We choose NFC as normalization form.

-- 
)Hiroaki Nakamura) hnakamur_at_gmail.com

Received on 2012-02-04 12:09:27 CET

This message: [ Message body ]
Next message: Hyrum K Wright: "Why do we check the base checksum so often?"
Previous message: Branko ÄŒibej: "Re: Commit Hooks Asynchronous?"
In reply to: Julian Foad: "Re: Let's discuss about unicode compositions for filenames!"
Next in thread: Peter Samuelson: "Re: Let's discuss about unicode compositions for filenames!"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]