[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Remaining l10n issues

From: <lundblad_at_softhome.net>
Date: 2004-05-07 12:18:22 CEST

Erik Huelsmann writes:

>> Peter N. Lundblad wrote:
>> > ...
>> > Yes, that's what I'm going to do. Problem is this:
>> > - svn_cmdline_printf() takes UTF-8 *strings* and other argument types
>> > - apr_psprintf will not touch the encoding of the strings, but it will
>> > ocnvert %d etc. to the locale's encoding (no, it won't, it converts it
>> > using numbers, . etc. from the C execution character set...)
>> > - When we are going to convert from UTF-8, we may have a mixed encoing
>> > string:-(
>> >
>> > Since the digits and the other characters that apr_psprintf will produce
>> > are in the ASCII (7-bit) range, they are the same for UTF-8 in
>> ASCII-based
>> > encodings. So this shouldn't be a problem in most cases. But what
>> happens
>> > on, i.e. EBCDIC systems? Anyone who knows?
>>
>> If you use stdio s(n)printf, the local encoding system is used.
>> Current EBCDIC-based systems (i.e. AS/400), to my knowledge, provide
>> a compile-time option for using EBCDIC or ASCII as the base
>> character set. If you compile in EBCDIC mode, then the numbers
>> will be in EBCDIC. However, so will all other characters, so the
>> whole "UTF-8 is a subset of the standard charset" assumption is
>> out the window. Compile in ASCII mode and those problems go away,
>> so *if* you decide to support those systems, that is probably the
>> only feasible way.
>
> These problems are dealt with starting with Peters newest patch submission,
> right? (At least that is my understanding) If so, does that mean we have a
> direction to work with now, on does the matter need more discussion?
>
Now, they aren't, since this discussion is about a possible
svn_cmdline_rintf/svn_cmdline_fprintf. But if we can say that we "only"
suport system where the execution character set for narrow characters (char)
is based on ASCII, this problem should go away because apr_psprintf only
outputs ASCII characters when formatting numbers. Is there anyone who thinks
this is a limitation? Else we could just drop the discussion and go on with
more interesting things:-) Else, we have other problems in the svn code I'm
afraid. For example, when looking at internal paths, we check for the slash
character by the character constant '/' (in the execution charset). If this
doesn't match UTF-8/ASCII, then we're in trouble and have to use constants
for the code point instead.

//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri May 7 12:18:38 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.