[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: log --search test failures on trunk and 1.8.x

From: Mattias Engdegård <mattiase_at_bredband.net>
Date: Sun, 21 Apr 2013 22:18:38 +0200

21 apr 2013 kl. 20.07 skrev Branko ÄŒibej:

> Yes, the obvious ones are German (ß == SS) equivalence and turkic (i
> ==
> İ) and (ı == I) equivalences (and that's aready three characters);
> but
> then in French, lowercase accented letters are equivalent to uppercase
> unaccented letters, whereas for example in Spanish that's not the
> case.
> And that's just looking at European and West Asian Latin scripts.
> There
> are at least 7 distinct Cyrillic scripts in roughly the same area that
> I'm aware of, and I certainly don't know the case-folding rules for
> all
> of them.

Not only is the above true, one should also be careful to distinguish
case conversion from case-insensitive matching; these follow different
rules.

For instance, converting lower-case letters to upper case in French
will retain the accents (most of the time - this is locale-dependent),
but they are generally expected to be ignored when searching. By
contrast, it would be an error to match "a" with "ä" in Swedish when
searching, or to drop the dots in a case conversion.

Clearly a case- and accent-sensitive search is much easier to
implement, but would benefit from normalisation. Bytewise matching is
on the lowest rung.
Received on 2013-04-21 22:19:15 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.