[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: log --search test failures on trunk and 1.8.x

From: Ivan Zhakov <ivan_at_visualsvn.com>
Date: Sun, 21 Apr 2013 19:11:14 +0400

On Sun, Apr 21, 2013 at 4:48 PM, Branko Čibej <brane_at_wandisco.com> wrote:
> On 21.04.2013 14:05, Stefan Sperling wrote:
>> On Sun, Apr 21, 2013 at 01:53:43PM +0200, Bert Huijben wrote:
>>> I'd rather pull the case insensitive search part of this new in 1.8 search feature and do it right in 1.9.
>> What's the issue with the current implementation apart from the
>> test failures on Windows?
>>
>> The behaviour of 'svn log --search' regarding case-sensitivity
>> isn't even documented, so we're not really prosmising anything.
>>
>> It is possible that some users who are using languages other than
>> English will complain, since ASCII is being matched case-insensitively,
>> and all other characters are being matched case-sensitively.
>> But this is due to a missing feature in APR's implemention of fnmatch().
>>
>> Provided we can fix the 1.8.x tests on Windows I see no reason to
>> change our implementation of log --search. We can simply wait for
>> APR to grow the necessary support for multibyte strings.
>
> The wc-collate-path branch has an svn_utf__glob function that's mainly
> intended for use by SQLite, however, it can be a replacement for
> apr_fnmatch. It uses apr_fnmatch internally, but decomposes the inputs
> to Unicode normalization form D, which keeps diacriticals separate from
> the base letters. In other words, we could easily extend that to do
> completely diacritical-agnostic case-folding matching for Latin
> alphabets (and probably also for Cyrillic scripts).
>
> The idea to manually hack things to work with western Latin alphabets
> seems completely wrong-headed to me.
>
> But yes; in general, case folding is locale-specific. If we wanted to
> support that, we'd need ICU instead of utf8proc. I can imagine that
> eventually being an option, but not a mandatory dependency.
>
According to Unicode case folding data [1] the only two characters
uses locale specific case-folding.

So I propose the following plan:
1. Make 'log --search" case-sensitive in trunk and 1.8.x.
2. Merge utf8proc stuff to trunk
3. Implement svn_utf__casefold() using utf8proc
4. Implement 'log --isearch' using
apr_fnmatch(svn_utf__casefold(pattern), svn_utf__casefold(string))

[1] http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com
Received on 2013-04-21 17:12:05 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.