[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

locale-sensitive features (was: [PATCH] Script for warning about error leaks)

From: Vincent Lefevre <vincent+svn_at_vinc17.org>
Date: 2006-02-09 16:10:15 CET

On 2006-02-08 03:30:41 +0000, Julian Foad wrote:
> Rather, we're talking about a range expression like "[a-z]" being
> interpreted by a particular program not as the range of characters having
> code('a') to code('z'), but as the set of characters found between 'a' and
> 'z' in the locale's collation table, which is primarily intended for
> sorting things into "alphabetical" order and typically differs from the
> character set encoding order, e.g. (fictional example)
>
> LANG=fr_FR
> Encoding order: ...ABCDE...XYZ...abcde...xyz...ÁÉ...áé...
> Collation table: ...aáâàAÁÂÀbBcçCÇdDeéêèEÉÊÈ...xXyYzZ...
>
> Thus, the important setting here is LC_COLLATE.

Someone reported here

  http://lists.asyd.net/pipermail/shell/2006-January/000094.html

(in French) that on his system, the range behavior was changed by
modifying LC_CTYPE, but not by modifying LC_COLLATE. I don't know
if this is a bug...

Also, I've seen that the tolower function is used on an MD5 hash
in "subversion/libsvn_fs_fs/fs_fs.c" to obtain the corresponding
ASCII lowercase character. If this code is run with a non-C/POSIX
locale, this is not safe. Though I don't know any locale where
this will fail, there's a similar example showing that it is
better to avoid tolower for this kind of things: in the Turkish
locales (e.g. tr_TR.iso8859-9), tolower('I') is not 'i', but a
dotless 'i' (0xfd). It is probably to base the code on 'a' - 'A'.

Possible similar problem in "subversion/libsvn_ra_dav/session.c".

As written in Mutt's BEWARE file:

  A word of warning about string comparisons: Since mutt may run in a
  huge variety of locales, case-insensitive string comparisons and
  case conversions may be dangerous. For instance, in iso-8859-9,
  tolower('I') is DIFFERENT from 'i' - it's indeed the Turkish dotless
  lowercase i.

-- 
Vincent Lefèvre <vincent_at_vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Feb 9 17:53:33 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.