[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Proposed resolution: Standardizing on UTF-8 isn't enough

From: Stefan Küng <tortoisesvn_at_gmail.com>
Date: 2007-07-19 17:49:57 CEST

B. Smith-Mannschott wrote:
> On 7/19/07, Matthias Wächter <matthias.waechter@tttech.com> wrote:
>> Eric,
>>
>> I am neither a Unicode nor Subversion (developer) expert, but let me
>> make my (verbose) point anyway.
>>
>> On 18.07.2007 16:15, Erik Huelsmann wrote:
>> > Unicode has 2 different representations
>>
>
> [... snip ...]
>
>> 5. What about Unicode code groups that represent one NFC symbol but
>> multiple NFD symbols that _cannot_ be re-translated to NFC? For
>> example, U+3374 SQUARE BAR [2] is a single code to represent the
>> character sequence 'bar' in square format. The given decomposition
>> is U+0062 U+0061 U+0072 which is the ASCII sequence 'bar'.
>> Certainly, re-coding to NFC will result in no change. Do we want to
>> disallow those? BTW: Is this correct, does OS X translate U+3374 to
>> this three-letter sequence?
>
> This is misleading. It's true for the NFKC and NFKD, the
> "compatibility" normalizations, which are lossy by design. NFD does
> not decompose SQUARE BAR.
>
> Do you know of an example where NFD->NFC->NFD is lossy?

I think this here might be one:
http://www.unicode.org/review/pr-29.html

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.net
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Jul 19 17:49:11 2007

This is an archived mail posted to the Subversion Dev mailing list.