[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 'svn log --search': forcing case sensitivity?

From: Daniel Shahaf <d.s_at_daniel.shahaf.name>
Date: Wed, 16 Mar 2016 02:54:40 +0000

Branko Čibej wrote on Tue, Mar 15, 2016 at 06:13:28 +0100:
> On 15.03.2016 01:08, Daniel Shahaf wrote:
> > kotkov_at_apache.org wrote on Fri, Feb 19, 2016 at 22:11:11 -0000:
> >> +/* Return TRUE if STR matches PATTERN. Else, return FALSE. Assumes that
> >> + * PATTERN is a UTF-8 string normalized to form C with case folding
> >> + * applied. Use BUF for temporary allocations. */
> >> +static svn_boolean_t
> >> +match(const char *pattern, const char *str, svn_membuf_t *buf)
> >> +{
> >> + svn_error_t *err;
> >> +
> >> + err = svn_utf__normalize(&str, str, strlen(str), TRUE /* casefold */, buf);
> >> + if (err)
> >> + {
> >> + /* Can't match invalid data. */
> >> + svn_error_clear(err);
> >> + return FALSE;
> >> + }
> >> +
> >> + return apr_fnmatch(pattern, str, 0) == APR_SUCCESS;
> > Should there be a command-line flag to disable casefolding?
> >
> > E.g., to allow users to grep for identifiers (function/variable/file
> > names) using their exact case? Do people who use 'log --search' need it
> > to be case-sensitive? (I don't use 'log --search' often.)
>
> I'd prefer to keep things simple.
>

Fair enough. I was concerned that users might perceive removing
case-sensitive search as a regression.

(I don't like having new knobs any more than you do.)

> And as I recall, this whole discussion began because apr_fnmatch
> doesn't like non-ASCII characters?

s/non-ASCII/multibyte/, but yes.

> > Even if casefolding is disabled, we should still apply Unicode
> > normalization to form C.
>
> There's no particular reason it has to be form C, as long as both the
> pattern and the string are normalized to the same form.

Agreed, that's what I meant to say.

> Using form D is possibly even a bit faster, since that's the internal
> 32-bit representation used by utf8proc. It's a pity we don't have
> a 32-bit-char fnmatch implementation.

> Still, as you note below, normalizing a glob pattern isn't entirely
> trivial to do correctly.
>
> > P.S. This patch introduces a minor behaviour change: before this patch,
> > the search pattern «foo[A-z]bar» would match the log message «foo_bar»,
> > whereas after this change it would not. (This is because the pattern is
> > now casefolded between being passed to APR, and '_' is between 'A'
> > and 'z' but not between 'A' and 'Z', when compared as C chars.) I doubt
> > anyone will notice this behaviour change; I'm just mentioning it for
> > completeness.
>
> Mmhh ... this is what comes of 'obviously trivial' solutions. :)

The important thing is that there is no _other_ apr_fnmatch() syntax
that changes meaning through case-folding the pattern, at least in
apr-1.5 with flags==0.

Cheers,

Daniel
Received on 2016-03-16 03:54:45 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.