[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Proposed resolution: Standardizing on UTF-8 isn't enough

From: B. Smith-Mannschott <benpsm_at_gmail.com>
Date: 2007-07-19 17:39:56 CEST

On 7/19/07, Matthias Wächter <matthias.waechter@tttech.com> wrote:
> Eric,
>
> I am neither a Unicode nor Subversion (developer) expert, but let me
> make my (verbose) point anyway.
>
> On 18.07.2007 16:15, Erik Huelsmann wrote:
> > Unicode has 2 different representations
>

[... snip ...]

> 5. What about Unicode code groups that represent one NFC symbol but
> multiple NFD symbols that _cannot_ be re-translated to NFC? For
> example, U+3374 SQUARE BAR [2] is a single code to represent the
> character sequence 'bar' in square format. The given decomposition
> is U+0062 U+0061 U+0072 which is the ASCII sequence 'bar'.
> Certainly, re-coding to NFC will result in no change. Do we want to
> disallow those? BTW: Is this correct, does OS X translate U+3374 to
> this three-letter sequence?

This is misleading. It's true for the NFKC and NFKD, the
"compatibility" normalizations, which are lossy by design. NFD does
not decompose SQUARE BAR.

Do you know of an example where NFD->NFC->NFD is lossy?

// bsmith@occs
Received on Thu Jul 19 17:39:02 2007

This is an archived mail posted to the Subversion Dev mailing list.