[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: log --search test failures on trunk and 1.8.x

From: Mattias Engdegňrd <mattiase_at_bredband.net>
Date: Sun, 21 Apr 2013 22:18:38 +0200

21 apr 2013 kl. 20.07 skrev Branko ─îibej:

> Yes, the obvious ones are German (├č == SS) equivalence and turkic (i
> ==
> ─░) and (─▒ == I) equivalences (and that's aready three characters);
> but
> then in French, lowercase accented letters are equivalent to uppercase
> unaccented letters, whereas for example in Spanish that's not the
> case.
> And that's just looking at European and West Asian Latin scripts.
> There
> are at least 7 distinct Cyrillic scripts in roughly the same area that
> I'm aware of, and I certainly don't know the case-folding rules for
> all
> of them.

Not only is the above true, one should also be careful to distinguish
case conversion from case-insensitive matching; these follow different

For instance, converting lower-case letters to upper case in French
will retain the accents (most of the time - this is locale-dependent),
but they are generally expected to be ignored when searching. By
contrast, it would be an error to match "a" with "├Ą" in Swedish when
searching, or to drop the dots in a case conversion.

Clearly a case- and accent-sensitive search is much easier to
implement, but would benefit from normalisation. Bytewise matching is
on the lowest rung.
Received on 2013-04-21 22:19:15 CEST

This is an archived mail posted to the Subversion Dev mailing list.