"Bill Tutt" <rassilon@lyra.org> writes:
> Several comments/questions:
> * By Unicode canonical decomposition, do you mean Normalization Form D
> as noted in TR15? (http://www.unicode.org/unicode/reports/tr15/)
>
> I ask because canonical decomposition results in all combined composite
> characters being expanded into their component forms. i.e. A composite
> umlauted lower case u turns into two characters. An umlaut followed by a
> lowercase u. I ask, because you really wouldn't want to implement the
> wrong normalization algorithm. :) TR15 also states the following:
>
> "The W3C Character Model for the World Wide Web [CharMod] requires the
> use of Normalization Form C for XML and related standards (this document
> is not yet final, but this requirement is not expected to change). See
> the W3C Requirements for String Identity, Matching, and String Indexing
> [CharReq] for more background."
I was punting. I knew that there were several ways to represent
composite characters, and assumed that there was some form recommended
for use in names that needed to be matched. From what you say, it
sounds like there are several. (Joy.)
> * What do you mean by ordering? It didn't sound like you were talking
> about a sorting order...
No --- I was trying to refer to the ordering of the modifiers. It
sounds like that is subsumed by the normalization form requirements
you mention above.
What I'm trying to do is put directory entries in some canonical form,
so that directory entries don't become mysteriously invisible because
different users chose different compositions/decompositions. What
would you recommend that I say?
Received on Sat Oct 21 14:36:22 2006