[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

re: Unicode directory entries

From: Bill Tutt <rassilon_at_lyra.org>
Date: 2001-02-16 01:05:13 CET

Karl quoted svn_fs.h:
> Below is Jim's comment from svn_fs.h:

The [directory] name should be in Unicode canonical decomposition and
ordering. No directory entry may be named '.', '..', or the empty
string. Given a directory entry name which fails to meet these
requirements, a filesystem function returns an SVN_ERR_FS_PATH_SYNTAX
[end snippet]

Several comments/questions:
* By Unicode canonical decomposition, do you mean Normalization Form D
as noted in TR15? (http://www.unicode.org/unicode/reports/tr15/)

I ask because canonical decomposition results in all combined composite
characters being expanded into their component forms. i.e. A composite
umlauted lower case u turns into two characters. An umlaut followed by a
lowercase u. I ask, because you really wouldn't want to implement the
wrong normalization algorithm. :) TR15 also states the following:

"The W3C Character Model for the World Wide Web [CharMod] requires the
use of Normalization Form C for XML and related standards (this document
is not yet final, but this requirement is not expected to change). See
the W3C Requirements for String Identity, Matching, and String Indexing
[CharReq] for more background."

* What do you mean by ordering? It didn't sound like you were talking
about a sorting order...

Received on Sat Oct 21 14:36:22 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.