[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: utf-8 sanity check.

From: Marcus Comstedt <marcus_at_mc.pp.se>
Date: 2002-07-11 00:01:59 CEST

Ben Collins-Sussman <sussman@collab.net> writes:

> I'm getting ready to write some python tests that verify that we can
> deal with paths that have international characters in them.
>
> But before I do that, I want to make sure I understand what's going
> on in our code:
>
> * our application's main() calls setlocale(LC_ALL, locale) if
> --locale is passed by the user. This officially sets the locale for
> our process.
>
> * our utf.c routines set up a xlation table by calling
> apr_xlate_open() with two arguments: "UTF-8" and APR_LOCALE_CHARSET.
>
> * The latter argument causes apr_xlate_open to call nl_langinfo(CODESET).
>
> * nl_langinfo(), part of libc, then returns the charset defined by
> the program's locale. (according to my man page, at least.)
>
> So by this trace, it seems to me that we're all ready to go, then.
> There's no need to cache the --locale argument and somehow pass it
> down into our svn_utf_* routines.
>
> Am I correct?

Yup. (Unless Karl has done anything strange, I haven't reviewed the
actual checkins yet.) However, there is a slight problem with
--locale. The handle returned by apr_xlate_open is cached globally,
without any way to expire it. This means that if any UTF conversion
takes place _before_ the --locale argument is passed, the new locale
will not be used since a convertor has already been created using the
locale set in the environment variables. As long as you make sure to
always put the --locale argument _first_ on the command line,
everything should be hunky dory though.

For a proper fix of the situation, two possibilities exist. Either
provide a mechanism for invalidating the cached convertor (simple), or
make sure that --locale is parsed first (more work). Which solution
is "correct" depends on what semantics we want for --locale. In

  env LC_CTYPE=en_GB.ISO8859-1 svn --option1=¤ \
         --locale=en_GB.ISO8859-15 --option2=¤ blah blah

should the value of --option1 be interpreted according to latin-1 or
latin-9? What about --option2?

  // Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Jul 11 00:08:01 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.