[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: log --search test failures on trunk and 1.8.x

From: Stefan Sperling <stsp_at_elego.de>
Date: Tue, 23 Apr 2013 14:51:50 +0200

On Tue, Apr 23, 2013 at 02:27:08PM +0200, Branko Čibej wrote:
> You're missing the point. tolower() works on individual characters, not
> whole strings; so it in general /cannot/ do correct locale-specific

Do you really mean characters, or bytes?
It sounds like you mean bytes. tolower() works on individual bytes.

> lowercasing. What you're looking for is something like strcoll(), except
> that it should be case-insensitive.

There are many existing functions that could be used to build
something fairly useful (wcscasecmp for instance).
 
> The long and short of it is that you need a complete locale-aware
> implementation of the Unicode standard to do proper case-insensitive
> comparisons; ICU is one such example.

Yes.

> Trying to retrofit anything less
> smart onto apr_fnmatch will not work correctly.

That depends on whether an fnmatch implementation is willing to live
with the limitations of the locale mechanism (one opaque charset
supported, any charset not in the current locale can give errors).
It seems that some people do think fnmatch() should do it this way:
http://opensource.apple.com/source/Libc/Libc-583/gen/FreeBSD/fnmatch.c
(Caution: This implementation has the out-of-bounds recursion bug
which made Bill rewrite fnmatch for APR...)

Subversion already assumes it can convert strings from UTF-8 to the
locale's character set for output. We could also assume that we can
convert log messages from UTF-8 to the current locale charset, and
write something that performs case-insensitive matching with wchar_t.
However, that's clearly out of scope for 1.8 as well :)

> (N.B., the svn_utf__glob function on the wc-collate-path branch is
> explicitly case-sensitive, it only deals with UTF normalization
> differences.)
>
>
> We can of course document the restriction that 'log --search' will give
> unexpected results when log messages are anything but ASCII. I'd even
> accept that as a marginal band-aid provided we promis to (at least
> partially) fix it in 1.9. Now, we can't really make that promise, can we. :)

I don't know. Perhaps just removing case-insensitive search is the
best option. We can always add it back later and do it right (under
some reasonable definition of "right" which, as far as I'm concerned,
remains to be determined).
Received on 2013-04-23 14:52:28 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.