[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Internationalizing applications is hard

From: Bill Tutt <rassilon_at_lyra.org>
Date: 2002-06-05 06:52:29 CEST

Indeed. Just to show how complicated il8n can becomes. Win32
applications have to pay attention to lots of il8n details. Without
further ado, here's that list:

* System Locale:
Determines which bitmap fonts, and OEM, ANSI, and MAC code pages are
defaults for the system. This only affects applications that are not
fully Unicode.
API Name: GetSystemDefaultLangID
* User Locale:
Determines which settings are used for formatting dates, times,
currency, and numbers as a default for each user. Also determines the
sort order for sorting text.
API Name: GetUserDefaultLangID
* Thread Locale:
Determines which settings are used for formatting dates, times,
currency, and large numbers for a thread. Also determines the sort order
for sorting text. Defaults to User Locale.
API Name: GetThreadLocale
This isn't as applicable for UI applications, but it's kind of required
for service run apps.
* Input Locale:
A pair consisting of language and a method of input.
API Name: GetKeyboardLayout
* System UI Language:
Determines the default language of menus and dialogs, messages, INF
files, and help files.
API Name: GetSystemDefaultUILanguage
* User UI Language:
Determines the language of menus and dialogs, messages, and help files.
API Name: GetUseDefaultUILanguage

Of course just to be annoying, you can't infer anything from all of the
above returned locales for formatting date/time/currency for UI
applications, since the Control Panel can override the default value of
all of the above.

Indeed, life is still annoying in UI il8n land for just of the few
following reasons, and believe me this is only a small subset of the
issues that can come up:
* Bi-directional text (lots of fun stuff here, UI issues with mirroring
coordinate spaces, and other odd things)
* Fonts:
Do not hard code font face names
Do not assume a given font is installed
Do not assume selected font supports the desired script
* Local Calendar Systems: Hebrew, Buddhist, Hijri, etc..
* Win32 Console Applications:
The 8-bit console I/O functions use the OEM code page whereas all other
8-bit functions use the ANSI code page by default. To avoid conflict in
code page conversions and to allow multilingual computing, your console
output should be encoded as Unicode whenever possible.
 
Tips and considerations:
* use WriteConsole to output Unicode strings. Note that this API works
only on console handles and can not be used for a redirection to a disk
file.
* If the output is being redirected to a disk file, use WriteFile with
the current console code page that can be retrieved by
GetConsoleOutputCP (the console code page might be different from the
currently selected OEM code page!).
* Complex scripts (Arabic, Hebrew, Thai, ...) are not supported in
console.
* Always create your log files with UTF-8 encoding.
* When doing text alignment (e.g. %1!-14s! %2!-14s! %3!-16s!), either
allocate the width of columns dynamically or truncate text wider than
the columns' width.
* To make sure that your multilingual resources are displayed properly
in the console window, always set your console thread locale according
to console output code page by using SetThreadUILanguage.

* String Comparisons:
If your string comparisons are not locale proof: e.g.: Using a locale
dependent case insensitive string compare on "GIF". (Doesn't work for
Turkish)

Etc.... You get the idea.

Bill

----
Do you want a dangerous fugitive staying in your flat?
No.
Well, don't upset him and he'll be a nice fugitive staying in your flat.
 
> -----Original Message-----
> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> 
> Changing the locale is not even what we want to do here.
> 
> Just because my log message text is in Big5, doesn't mean I don't
> still want Subversion's error and other messages printed in the
> dominant system locale (non-Big5).
> 
> What I was thinking of was not a --locale option, but simply an option
> (long opt only, no short equiv) that says "my log message is in
> encoding FOO", like this:
> 
>   --log-message-encoding=FOO
> 
> The main purpose of this is for other programs to pass the option,
> though a human is perfectly free to do so.
> 
> I don't have a problem "cluttering" the long-option space.  It's the
> short opt space where we need to be ultra conservative.
> 
> -K
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jun 5 06:52:58 2002

This is an archived mail posted to the Subversion Dev mailing list.