On 7/19/07, Matthias Wächter <email@example.com> wrote:
> I am neither a Unicode nor Subversion (developer) expert, but let me
> make my (verbose) point anyway.
> On 18.07.2007 16:15, Erik Huelsmann wrote:
> > Unicode has 2 different representations
[... snip ...]
> 5. What about Unicode code groups that represent one NFC symbol but
> multiple NFD symbols that _cannot_ be re-translated to NFC? For
> example, U+3374 SQUARE BAR  is a single code to represent the
> character sequence 'bar' in square format. The given decomposition
> is U+0062 U+0061 U+0072 which is the ASCII sequence 'bar'.
> Certainly, re-coding to NFC will result in no change. Do we want to
> disallow those? BTW: Is this correct, does OS X translate U+3374 to
> this three-letter sequence?
This is misleading. It's true for the NFKC and NFKD, the
"compatibility" normalizations, which are lossy by design. NFD does
not decompose SQUARE BAR.
Do you know of an example where NFD->NFC->NFD is lossy?
Received on Thu Jul 19 17:39:02 2007