[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-05-23 00:59:22 CEST

On Wed, May 22, 2002 at 11:59:51PM +0200, Marcus Comstedt wrote:
> Ulrich Drepper <drepper@redhat.com> writes:
>...
> > First, Unix requires a function named nl_langinfo() which returns just
> > the wanted information. So you should have something like
> >
> > static const char *
> > find_native_charset (void)
> > {
> > #ifdef HAVE_NL_LANGINFO
> > return nl_langinfo (CODESET);
> > #else
> > ...
> > #endif
> > }
>
> Cool. I sat for an whole hour trying to find a function that did
> that, without succeeding. :-)

Well, you shouldn't have to worry about it at all, actually. Take a look at:

  apr/include/apr_xlate.h
  apr/i18n/unix/*

You'll note that xlate.c already has a call to nl_langinfo() in it.
Otherwise, it defaults to some other code to derive the current charset.

APR has got a number of functions for transcoding strings, so SVN might not
even need any. Further, the apr-iconv project is available for platforms
that don't have iconv() builtin.

> > To support systems without nl_langinfo() you cannot simply look at the
> > LC_CTYPE environment variable. Its value need not have anything to do
> > with the selected locale. The order in which the setlocale() function
> > looks at environment variables for the LC_CTYPE category is
> >
> > LC_ALL -> LC_CTYPE -> LANG
> >
> > I.e., if LC_ALL is set use it. Otherwise if LC_CTYPE is set, use this.
> > Else use LANG if set.
>
> I know. It was a quick hack, as I suspected there had to be a better
> way to do it...

Ulrich's suggestions should be applied towards patches to fix up APR, rather
than encode this knowledge into SVN only.

> > But your problems won't stop there. Charsets can have many different
> > names. Other systems provide different mechanisms to determine the
> > current charset etc etc.
> >
> > Look at the file localcharset.c which is used in some GNU packages (and
> > other GPL'ed ones). I attach a copy. This is what you'll have to do.
>
> Nice. I'll take a look at it.

I would encourage you to look at this. Ulrich is very well qualified to talk
about this stuff (since he is the primary glibc maintainer :-).

>...
> > > + if (cd == (iconv_t)-1)
> > > + return svn_error_create (0, errno, NULL, pool, "recoding string");
> > > +
> > > + do {
> > > +
> > > + char *destbuf = apr_palloc (pool, buflen);
> > > +
> > > + /* Set up state variables for iconv */
> > > + const char *srcptr = src_data;
> > > + char *destptr = destbuf;
> > > + size_t srclen = src_length;
> > > + size_t destlen = buflen;
> > > +
> > > + /* Attempt the conversion */
> > > + if (iconv(cd, &srcptr, &srclen, &destptr, &destlen) != (size_t)-1) {

Again, note that iconv() is unportable, so the apr_xlate functions should be
used instead.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu May 23 00:57:05 2002

This is an archived mail posted to the Subversion Dev mailing list.