[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Comments on 'notes/unicode-composition-for-filenames'

From: Stefan Sperling <stsp_at_elego.de>
Date: Tue, 22 Feb 2011 19:56:42 +0100

On Tue, Feb 22, 2011 at 07:41:12PM +0100, Branko Čibej wrote:
> On 22.02.2011 18:17, Julian Foad wrote:
> >> Proposed Support Library
> >> ========================
> >>
> >> Assumptions
> >> -----------
> >>
> >> The main assumption is that we'll keep using APR for character set
> > s/character set/character encoding/.
> >
> >> conversion, meaning that the recoding solution to choose would not
> >> need to provide any other functionality than recoding.
> > s/recoding/converting between NFD and NFC UTF8 encodings/.
>
> Actually -- you have to go all the way and support complete
> normalization, even if your normalization targets are only NFC and NFD.
> That's because there isn't a sane way to detect whether a string is
> normalized or not -- "sane" in the sense that it should take about as
> long to discover that as to just normalize it.

To put it differently, the only way to figure out whether a given
UTF-8 sequence is valid (or, by extension, uses NFC and/or NFD)
is to parse the entire sequence.
Received on 2011-02-22 19:57:25 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.