[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Description of all (known to me) issues with gettext & svn

From: Erik Huelsmann <e.huelsmann_at_gmx.net>
Date: 2005-03-03 19:57:56 CET

Due to the complexity of the subject and the need for documentation of some
of these issues, I wrote what I know about l10n with gettext below.

I'll be committing it, but send it here for review and completion.

bye,

Erik.

Issues (and their resolutions) when using gettext for message translation

Contents
========

 * Windows issues
 * Automatic characterset conversion
 * Translations on the client
 * No translations on the server
 * Translating plural forms (ngettext() support)


Windows issues
==============

On Windows, Subversion is linked against a modified version of GNU gettext.
This resolves several issues:

 - Eliminated need to link against libiconv (which would be the second
   iconv library, since we already link against apr-iconv)
 - No automatic charset conversion (guaranteed UTF-8 strings returned by
   gettext() calls without performance penalties)

more in the paragraphs below...


Automatic characterset conversion
=================================

Some gettext implementations automatically convert the strings in the
message catalogue to the active system characterset. The source encoding
is stored in the "" message id. The message string looks somewhat like
a mime header and contains a "Content-Encoding" line. It's typically GNU's
gettext which does this.

Subversion uses UTF-8 to encode strings internally, which may not be the
systems default character encoding. To prevent internal corruption,
libsvn_subr:svn_cmdline_init2() explicitly tells gettext to return UTF-8
encoded strings if it has bind_textdomain_codeset().

Some gettext implementations don't contain automatic string recoding. In
order to work with both recoding and non-recoding implementations, the
source strings must be UTF-8 encoded. This is achieved by requiring .po
files to be UTF-8 encoded. [Note: a pre-commit hook has been installed to
ensure this.]

On Windows Subversion links against a version of GNU gettext, which has
been modified not to do character conversions. This eliminates the
requirement to link against libiconv which would mean Subversion being
linked against 2 iconv libraries (apr_iconv as well as libiconv).


Translations on the client
==========================

The translation effort is to translate all error messages generated on
the system on which the user has invoked his subversion command (svnadmin,
svnlook, svndumpfilter, svnversion or svn).

This means that in all layers of the libraries strings have been marked for
translation, either with _() or N_().

Parameters are sprintf-ed straight into errorstrings at the time they are
added to the error structure, so most strings are marked with _() and
translated directly into the language for which the client was set up.
[Note: The N_() macro markes strings for delayed translation.]


Translations on the server
==========================

On systems which define the LC_MESSAGES constant, setlocale() can be used
to set string translation for all (error) strings even those outside
the Subversion domain.

Windows doesn't define LC_MESSAGES. Instead GNU gettext uses the environ-
ment variables LANGUAGE, LC_ALL, LC_MESSAGES and LANG (in that order) to
find out what language to translate to. If none of these are defined, the
system and user default locales are queried.

While systems which have the LC_MESSAGES flag (or setenv() - of which
Windows has neither) allow languages to be switched at run time, this cannot
be done portably.

Any attempt to use setlocale() in an Apache environment will conflict with
settings other modules expect to be setup. On the svnserve side having no
portable way to change languages dynamically, means that the environment
has to be set up correctly from the start.

In other words, there is no way - programmatically - to ensure that messages
are served in any specific language.

Note: Original consensus indicated that translation of messages at the
server side should stay untranslated for transmission to the client. Client
side translation is not an option, because by then the parameter values
have been inserted into the string meaning that it can't be looked up in the
messages catalogue anymore.


Translating plural forms (ngettext() support)
=============================================

The code below works in english and can be translated to a number of
languages. However in some languages more than 2 forms are required
to do a correct translation. The ngettext() function takes care of
grabbing the right translation for those languages. Unfortunately,
the function is a GNU extention and thus non-portable.

  message = (n > 1) ? _("1 File found") :
                      apr_sprintf (pool, _("%d Files found"), n);

Because of this limitation, some strings in the client have not been
marked for translation.

*** We're looking for good suggestions to work around this.

-- 
DSL Komplett von GMX +++ Supergünstig und stressfrei einsteigen!
AKTION "Kein Einrichtungspreis" nutzen: http://www.gmx.net/de/go/dsl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Mar 3 19:59:09 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.