On Tue, Feb 22, 2011 at 07:41:12PM +0100, Branko Čibej wrote:
> On 22.02.2011 18:17, Julian Foad wrote:
> >> Proposed Support Library
> >> ========================
> >>
> >> Assumptions
> >> -----------
> >>
> >> The main assumption is that we'll keep using APR for character set
> > s/character set/character encoding/.
> >
> >> conversion, meaning that the recoding solution to choose would not
> >> need to provide any other functionality than recoding.
> > s/recoding/converting between NFD and NFC UTF8 encodings/.
>
> Actually -- you have to go all the way and support complete
> normalization, even if your normalization targets are only NFC and NFD.
> That's because there isn't a sane way to detect whether a string is
> normalized or not -- "sane" in the sense that it should take about as
> long to discover that as to just normalize it.
To put it differently, the only way to figure out whether a given
UTF-8 sequence is valid (or, by extension, uses NFC and/or NFD)
is to parse the entire sequence.
Received on 2011-02-22 19:57:25 CET