[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 'svn log --search': forcing case sensitivity?

From: Branko Čibej <brane_at_apache.org>
Date: Tue, 15 Mar 2016 06:13:28 +0100

On 15.03.2016 01:08, Daniel Shahaf wrote:
> kotkov_at_apache.org wrote on Fri, Feb 19, 2016 at 22:11:11 -0000:
>> Author: kotkov
>> Date: Fri Feb 19 22:11:11 2016
>> New Revision: 1731300
>>
>> URL: http://svn.apache.org/viewvc?rev=1731300&view=rev
>> Log:
>> Make svn log --search case-insensitive.
>>
>> Use utf8proc to do the normalization and locale-independent case folding
>> (UTF8PROC_CASEFOLD) for both the search pattern and the input strings.
>>
>> Related discussion is in http://svn.haxx.se/dev/archive-2013-04/0374.shtml
>> (Subject: "log --search test failures on trunk and 1.8.x").
>>
>> +++ subversion/trunk/subversion/svn/log-cmd.c Fri Feb 19 22:11:11 2016
>> @@ -38,6 +38,7 @@
>> @@ -110,6 +111,24 @@
>> +/* Return TRUE if STR matches PATTERN. Else, return FALSE. Assumes that
>> + * PATTERN is a UTF-8 string normalized to form C with case folding
>> + * applied. Use BUF for temporary allocations. */
>> +static svn_boolean_t
>> +match(const char *pattern, const char *str, svn_membuf_t *buf)
>> +{
>> + svn_error_t *err;
>> +
>> + err = svn_utf__normalize(&str, str, strlen(str), TRUE /* casefold */, buf);
>> + if (err)
>> + {
>> + /* Can't match invalid data. */
>> + svn_error_clear(err);
>> + return FALSE;
>> + }
>> +
>> + return apr_fnmatch(pattern, str, 0) == APR_SUCCESS;
> Should there be a command-line flag to disable casefolding?
>
> E.g., to allow users to grep for identifiers (function/variable/file
> names) using their exact case? Do people who use 'log --search' need it
> to be case-sensitive? (I don't use 'log --search' often.)

I'd prefer to keep things simple. And as I recall, this whole discussion
began because apr_fnmatch doesn't like non-ASCII characters?

> Even if casefolding is disabled, we should still apply Unicode
> normalization to form C.

There's no particular reason it has to be form C, as long as both the
pattern and the string are normalized to the same form. Using form D is
possibly even a bit faster, since that's the internal 32-bit
representation used by utf8proc. It's a pity we don't have a 32-bit-char
fnmatch implementation.

Still, as you note below, normalizing a glob pattern isn't entirely
trivial to do correctly.

> P.S. This patch introduces a minor behaviour change: before this patch,
> the search pattern «foo[A-z]bar» would match the log message «foo_bar»,
> whereas after this change it would not. (This is because the pattern is
> now casefolded between being passed to APR, and '_' is between 'A'
> and 'z' but not between 'A' and 'Z', when compared as C chars.) I doubt
> anyone will notice this behaviour change; I'm just mentioning it for
> completeness.

Mmhh ... this is what comes of 'obviously trivial' solutions. :)

-- Brane
Received on 2016-03-15 06:13:37 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.