Re: log --search test failures on trunk and 1.8.x

From: Mattias Engdegård <mattiase_at_bredband.net>
Date: Sun, 21 Apr 2013 22:18:38 +0200

21 apr 2013 kl. 20.07 skrev Branko ÄŒibej:

> Yes, the obvious ones are German (ÃŸ == SS) equivalence and turkic (i
> ==
> Ä°) and (Ä± == I) equivalences (and that's aready three characters);
> but
> then in French, lowercase accented letters are equivalent to uppercase
> unaccented letters, whereas for example in Spanish that's not the
> case.
> And that's just looking at European and West Asian Latin scripts.
> There
> are at least 7 distinct Cyrillic scripts in roughly the same area that
> I'm aware of, and I certainly don't know the case-folding rules for
> all
> of them.

Not only is the above true, one should also be careful to distinguish
case conversion from case-insensitive matching; these follow different
rules.

For instance, converting lower-case letters to upper case in French
will retain the accents (most of the time - this is locale-dependent),
but they are generally expected to be ignored when searching. By
contrast, it would be an error to match "a" with "Ã¤" in Swedish when
searching, or to drop the dots in a case conversion.

Clearly a case- and accent-sensitive search is much easier to
implement, but would benefit from normalisation. Bytewise matching is
on the lowest rung.
Received on 2013-04-21 22:19:15 CEST

This message: [ Message body ]
Next message: Stefan Sperling: "Re: log --search test failures on trunk and 1.8.x"
Previous message: Ivan Zhakov: "Re: log --search test failures on trunk and 1.8.x"
In reply to: Branko ÄŒibej: "Re: log --search test failures on trunk and 1.8.x"
Next in thread: Stefan Sperling: "Re: log --search test failures on trunk and 1.8.x"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]