On 23.04.2013 08:53, Stefan Sperling wrote:
> On Mon, Apr 22, 2013 at 01:13:43PM +0200, Branko Čibej wrote:
>> On 22.04.2013 12:59, Bert Huijben wrote:
>>> The assertion shows a design problem which we should handle for future compatibility and you suggest just adding some bandages to patch/hide the test failure?
>>>
>>> The current code is broken and the suggestion you do is like the solution mostly vetoed by most of the responders in this thread: assuming there is only us-english, by using a function that has platform specific behavior.
>>>
>>> (tolower() is locale and platform character encoding dependent. You should never just pass individual UTF-8 bytes to it)
> OK Bert, I can see how, for example, a tolower() implementation which runs
> in a latin1 locale could convert parts of a UTF-8 string which contains
> bytes that are part of a multibyte character, if such bytes happen to have
> the same value as some upper case letter from the latin1 symbol range [128-255].
You're missing the point. tolower() works on individual characters, not
whole strings; so it in general /cannot/ do correct locale-specific
lowercasing. What you're looking for is something like strcoll(), except
that it should be case-insensitive.
The long and short of it is that you need a complete locale-aware
implementation of the Unicode standard to do proper case-insensitive
comparisons; ICU is one such example. Trying to retrofit anything less
smart onto apr_fnmatch will not work correctly.
(N.B., the svn_utf__glob function on the wc-collate-path branch is
explicitly case-sensitive, it only deals with UTF normalization
differences.)
We can of course document the restriction that 'log --search' will give
unexpected results when log messages are anything but ASCII. I'd even
accept that as a marginal band-aid provided we promis to (at least
partially) fix it in 1.9. Now, we can't really make that promise, can we. :)
-- Brane
--
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com
Received on 2013-04-23 14:27:45 CEST