[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: log --search test failures on trunk and 1.8.x

From: Bert Huijben <bert_at_qqmail.nl>
Date: Sun, 21 Apr 2013 15:37:13 +0200

> -----Original Message-----
> From: Branko Čibej [mailto:brane_at_wandisco.com]
> Sent: zondag 21 april 2013 14:48
> To: dev_at_subversion.apache.org
> Subject: Re: log --search test failures on trunk and 1.8.x
>
> On 21.04.2013 14:05, Stefan Sperling wrote:
> > On Sun, Apr 21, 2013 at 01:53:43PM +0200, Bert Huijben wrote:
> >> I'd rather pull the case insensitive search part of this new in 1.8 search
> feature and do it right in 1.9.
> > What's the issue with the current implementation apart from the
> > test failures on Windows?
> >
> > The behaviour of 'svn log --search' regarding case-sensitivity
> > isn't even documented, so we're not really prosmising anything.
> >
> > It is possible that some users who are using languages other than
> > English will complain, since ASCII is being matched case-insensitively,
> > and all other characters are being matched case-sensitively.
> > But this is due to a missing feature in APR's implemention of fnmatch().
> >
> > Provided we can fix the 1.8.x tests on Windows I see no reason to
> > change our implementation of log --search. We can simply wait for
> > APR to grow the necessary support for multibyte strings.
>
> The wc-collate-path branch has an svn_utf__glob function that's mainly
> intended for use by SQLite, however, it can be a replacement for
> apr_fnmatch. It uses apr_fnmatch internally, but decomposes the inputs
> to Unicode normalization form D, which keeps diacriticals separate from
> the base letters. In other words, we could easily extend that to do
> completely diacritical-agnostic case-folding matching for Latin
> alphabets (and probably also for Cyrillic scripts).
>
> The idea to manually hack things to work with western Latin alphabets
> seems completely wrong-headed to me.
>
> But yes; in general, case folding is locale-specific. If we wanted to
> support that, we'd need ICU instead of utf8proc. I can imagine that
> eventually being an option, but not a mandatory dependency.

Summarizing: What would it help to include utf8proc on trunk now for this issue?

Your conclusion is (similar to mine) that we need more for case folding than what we have now and/or what utf8proc will offer us.

Do we want case folding (or at least case variant compare) support in our libraries for 1.8?

Or is this 1.9+ scope?

        Bert
Received on 2013-04-21 15:38:16 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.