On 19.07.2007 17:39, B. Smith-Mannschott wrote:
> On 7/19/07, Matthias Wächter <firstname.lastname@example.org> wrote:
>> 5. What about Unicode code groups that represent one NFC symbol but
>> multiple NFD symbols that _cannot_ be re-translated to NFC? For
>> example, U+3374 SQUARE BAR  is a single code to represent the
>> character sequence 'bar' in square format. The given decomposition
>> is U+0062 U+0061 U+0072 which is the ASCII sequence 'bar'.
>> Certainly, re-coding to NFC will result in no change. Do we want to
>> disallow those? BTW: Is this correct, does OS X translate U+3374 to
>> this three-letter sequence?
> This is misleading. It's true for the NFKC and NFKD, the
> "compatibility" normalizations, which are lossy by design. NFD does
> not decompose SQUARE BAR.
Thanks for pointing this out. Just verified with python.
> Do you know of an example where NFD->NFC->NFD is lossy?
Some of my 'knowledge' is from , it states that normalization 
can be one way at least for old and replacement symbols. E.g. this
applies to U+212B ANGSTROM SIGN (formerly ANGSTROM UNIT)  being
converted to U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE  being
decomposed to U+0041 U+030A. So the first normalization ->NFD
results in U+00C5 which could then be successfully rebuilt by
finishing the cycle ->NFC->NFD. Apparantly, normalizing to NFC
already contains normalizing to NFD as a first step.
Interestingly, ANGSTROM SIGN, as a unit, should not have a
lower-case representation. But as a latin capital letter A with ring
above, certainly, there is no unit meaning on it anymore, so there
is a lower-case variant U+00E5 available. Actually, even for
Angstrom sign, this lower-case representation is given. OTOH, there
is no lower-case representation for U+2103 DEGREE CELSIUS . Weird.
Similarly, U+F900 CJK COMPATIBILITY IDEOGRAPH-F900 is NFD-normalized
to U+8C48 'how? what?' which stays NFC-normalized U+8C48. No way
back to U+F900.
Matthias Wächter - Senior Chip Designer
TTTech Computertechnik AG - Time-Triggered Technology
Commercial Reg. No.: 165 664z, Commercial Court Vienna
Schoenbrunner Strasse 7, A-1040 Vienna, Austria
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Thu Jul 19 18:14:42 2007