On 20.02.2016 12:09, Bert Huijben wrote:
>
>> -----Original Message-----
>> From: kotkov_at_apache.org [mailto:kotkov_at_apache.org]
>> Sent: vrijdag 19 februari 2016 23:11
>> To: commits_at_subversion.apache.org
>> Subject: svn commit: r1731300 - in /subversion/trunk/subversion:
>> include/private/svn_utf_private.h libsvn_repos/dump.c
>> libsvn_subr/utf8proc.c svn/cl-log.h svn/log-cmd.c svn/svn.c
>> tests/cmdline/log_tests.py tests/libsvn_subr/utf-test.c
>>
>> Author: kotkov
>> Date: Fri Feb 19 22:11:11 2016
>> New Revision: 1731300
>>
>> URL: http://svn.apache.org/viewvc?rev=1731300&view=rev
>> Log:
>> Make svn log --search case-insensitive.
>>
>> Use utf8proc to do the normalization and locale-independent case folding
>> (UTF8PROC_CASEFOLD) for both the search pattern and the input strings.
>>
>> Related discussion is in http://svn.haxx.se/dev/archive-2013-04/0374.shtml
>> (Subject: "log --search test failures on trunk and 1.8.x").
>>
>> * subversion/include/private/svn_utf_private.h
>> (svn_utf__normalize): Add new boolean argument to perform case folding.
> Usually it is far more efficient to perform the comparison on the unnormalized strings using the apis, than to normalize and perform the operation later. I'm not sure if utf8proc supports this feature though
>
> But I'm wondering why you added this feature to an existing function?
>
> I don't think it is recommended practice to perform the normalization this way and adding a boolean to an existing function makes it easier to do perform things in a not recommended way.
Adding flags that drastically change the semantics of a function is just
broken API design, period.
> Locale independent case folding is not that well defined... Things like the Turkish 'i' that doesn't fold, so any decision on that makes it locale dependent. (n this case probably by choosing not Turkish, but that doesn't make it 'locale independent'.
>
> Just folding the western European characters is much easier to explain/document.
Not really. For example, 'á' and 'A' are equivalent, but 'ß' and 'SS'
are not — whereas the latter should be equivalent in German, but I doubt
very much that utf8proc does that right. Case-insensitive comparison
must *always* be done in the context of a well-defined locale. Anything
that calls itself "locale-independent" is likely to be wrong in a really
huge number of cases.
Furthermore, this fix is not in any way related to the discussion that
the log message points to ... it's pretty clear that the bug is in
apr_fnmatch and should be fixed there.
In light of the above, I'd like to see some more discussion about a
realistic solution to the problem. So-called "locale-independent" case
folding isn't locale-independent, and I don't see how it can solve the
original problem without introducing a bunch of new bugs. It's scary
enough that the test cases don't try anything other than accented latin
characters.
-- Brane
Received on 2016-02-20 16:57:32 CET