[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Fwd: Standardizing on UTF8 internally isn't enough

From: B. Smith-Mannschott <benpsm_at_gmail.com>
Date: 2007-07-19 11:13:04 CEST

[initially sent just to Justin. reposting to list.]

On 7/19/07, Justin Erenkrantz <justin@erenkrantz.com> wrote:

> Why can't we just do input validation ourselves? This is clearly a
> very specific corner case and one we can detect quite trivially (i.e.
> look for the chars that only exist in normalization form D).

Just to clear up any potential misunderstanding:

NFC also uses combining characters. This means you can't just look for
the presence of combining characters to decide wether you are not NFC.

NFC uses composed code points, where available, otherwise it falls
back on a decomposed representation. (There aren't composed code
points defined for every possible application of combining
characters.)

(BTW, This also means that going NFC internally doesn't save you from
having to deal with combining characters. There's no reduction in
complexity by going NFC over NFD internally.)

// bsmith@occs

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Jul 19 11:12:13 2007

This is an archived mail posted to the Subversion Dev mailing list.