[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: i18n hell.

From: Nuutti Kotivuori <naked_at_iki.fi>
Date: 2002-07-11 21:17:45 CEST

Ben Collins-Sussman wrote:
>
> So I compiled subversion with --enable-utf8, and suddenly started
> getting errors on *every* invocation of 'svn':
>
> apr_error: #22, src_err 0 : <Invalid argument>
> (charset translator procurement failed)
>
> So I traced into our call to apr/i18n/unix/xlate.capr_xlate_open().
>
> This function had "UTF-8" passed in already, and the value
> APR_LOCALE_CHARSET caused the code to run nl_langinfo(CODESET). The
> return value from nl_langinfo was "ISO8859-1".
>
> Then we call iconv_open() on these two strings: bam, I get an EINVAL
> error. What's invalid, you ask?
>
> It turns out that my iconv only accepts "ISO-8859-1", not
> "ISO8859-1":
>
> $ man iconv_open
> ...
> European languages
> ASCII, ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16},
> ...
>
> $ iconv -f UTF-8 -t ISO8859-1 foo
> iconv: conversion to ISO8859-1 unsupported
> $ iconv -f UTF-8 -t ISO-8859-1 foo
> iconv: foo: No such file or directory
>
> This seems way screwed up to me. The unhyphenated codepage name
> came from nl_langinfo(), which is part of my FreeBSD 4.5 libc! And
> this isn't accepted by iconv??

Oof, you've truly hit into a nest of trouble.

For a very long time, every GTK program refused to work with
LC_CTYPE=fi_FI. This was because glibc stubbornly decided that
ISO-8859-1 is the charset name - and X11R6 decided that ISO8859-1 is
the charset name. This had a long list of aliases and other things
which affected this both ways. X and glibc have both their _own_
locales, which are used for a bit different things - and these do
conflict at times.

Just today, because of another mess, I tried again, how does glibc
handle these. And the result was that ISO-8859-1 works, but so do
ISO8859-1, ISO88591 and ISO_8859-1. And I was unable to find the magic
piece of code that makes all of these works.

In BSD land I've heard that ISO_8859-1 and ISO8859-1 generally work,
but ISO-8859-1 does not, not anywhere. I don't know how true this is.

Way back, when OSX was just released - I heard that locales did not
really work at all. I have no idea what's the status these days.

Locales and charsets have had troubles on several distributions for a
very long time - and some of those problems are not too easily
solved. And standardization between these seems to be a long way
off. I don't really know what would be a good way to handle all this.

-- Naked

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Jul 11 21:19:06 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.