> Peter N. Lundblad wrote:
> > ...
> > Yes, that's what I'm going to do. Problem is this:
> > - svn_cmdline_printf() takes UTF-8 *strings* and other argument types
> > - apr_psprintf will not touch the encoding of the strings, but it will
> > ocnvert %d etc. to the locale's encoding (no, it won't, it converts it
> > using numbers, . etc. from the C execution character set...)
> > - When we are going to convert from UTF-8, we may have a mixed encoing
> > string:-(
> >
> > Since the digits and the other characters that apr_psprintf will produce
> > are in the ASCII (7-bit) range, they are the same for UTF-8 in
> ASCII-based
> > encodings. So this shouldn't be a problem in most cases. But what
> happens
> > on, i.e. EBCDIC systems? Anyone who knows?
>
> If you use stdio s(n)printf, the local encoding system is used.
> Current EBCDIC-based systems (i.e. AS/400), to my knowledge, provide
> a compile-time option for using EBCDIC or ASCII as the base
> character set. If you compile in EBCDIC mode, then the numbers
> will be in EBCDIC. However, so will all other characters, so the
> whole "UTF-8 is a subset of the standard charset" assumption is
> out the window. Compile in ASCII mode and those problems go away,
> so *if* you decide to support those systems, that is probably the
> only feasible way.
These problems are dealt with starting with Peters newest patch submission,
right? (At least that is my understanding) If so, does that mean we have a
direction to work with now, on does the matter need more discussion?
> Frankly, I doubt if Subversion compiles on AS/400 or any other
> EBCDIC platform, so this is likely a non-issue.
>
> > This is getting really messy, but since we have to replace stdio printf
> > and friends to support other things, I think we will have to solve this
> > problem.
>
> At some point you may want to define a minimum platform to
> support which includes being based on ASCII and not EBCDIC.
But we seem to be tied to UCS2 for Windows support though. I know little
more about it than the fact that the ascii '1' character does not have the
same representation in it (which it does it UTF-8). So I'm afraid that trick
won't help much...
bye,
Erik.
--
NEU : GMX Internet.FreeDSL
Ab sofort DSL-Tarif ohne Grundgebühr: http://www.gmx.net/dsl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri May 7 00:34:52 2004