On 21.04.2013 14:05, Stefan Sperling wrote:
> On Sun, Apr 21, 2013 at 01:53:43PM +0200, Bert Huijben wrote:
>> I'd rather pull the case insensitive search part of this new in 1.8 search feature and do it right in 1.9.
> What's the issue with the current implementation apart from the
> test failures on Windows?
>
> The behaviour of 'svn log --search' regarding case-sensitivity
> isn't even documented, so we're not really prosmising anything.
>
> It is possible that some users who are using languages other than
> English will complain, since ASCII is being matched case-insensitively,
> and all other characters are being matched case-sensitively.
> But this is due to a missing feature in APR's implemention of fnmatch().
>
> Provided we can fix the 1.8.x tests on Windows I see no reason to
> change our implementation of log --search. We can simply wait for
> APR to grow the necessary support for multibyte strings.
The wc-collate-path branch has an svn_utf__glob function that's mainly
intended for use by SQLite, however, it can be a replacement for
apr_fnmatch. It uses apr_fnmatch internally, but decomposes the inputs
to Unicode normalization form D, which keeps diacriticals separate from
the base letters. In other words, we could easily extend that to do
completely diacritical-agnostic case-folding matching for Latin
alphabets (and probably also for Cyrillic scripts).
The idea to manually hack things to work with western Latin alphabets
seems completely wrong-headed to me.
But yes; in general, case folding is locale-specific. If we wanted to
support that, we'd need ICU instead of utf8proc. I can imagine that
eventually being an option, but not a mandatory dependency.
-- Brane
--
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com
Received on 2013-04-21 14:48:40 CEST