[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Porting Subversion to EBCDIC

From: Branko Čibej <brane_at_xbc.nu>
Date: 2005-02-11 01:07:03 CET

Mark Phippard wrote:

>In writing a reply to Julian about our use of escape characters in our
>EBCDIC port, it seemed like a good time to offer up a more detailed
>explanation of what we are doing and where we are.
>
>You can download our current patch against the 1.1.x branch here:
>
>http://support.softlanding.com/ebcdic.diff
>
>The patch is 362K
>
>
It sure is...

Well, as expected, string literals are the biggest problem and the
biggest change. I think that, if we ever want to merge EBCDIC support
into the mainline -- and I think we do in the long run, because
supporing two branches would be too much work -- then something has to
be done about that. Using character escapes like that is simply not
maintainable.

But I think this can be solved by inventing some kind of string-literal
conversion policy similar to what we're doing on Windows with the
console charset.

Today, the SVN libraries juggle with four different encodings (and
character sets):

    * Internal: the encoding expected by most public APIs. This is (and
      will most probably remain) UTF-8.
    * Native: the encoding of string literals, program arguments, etc.
      99% of the code today assumes this to be a strict (7-bit) subset
      of UTF-8.
    * APR: the encoding that APR (and Apache) functions expect. On Unix
      and Win9x, this is the same as Native; on WinNT, it's the same as
      Internal, i.e., UTF-8.
    * Console: the encoding used for writing to the console and reading
      from the console. On Unix, this is the same as Native. On Windows,
      it's something else (usually some kind of OEM crap).

So for example, converting a string from internal to APR encoding and
back is a no-op on WinNT.

In order to support EBCDIC, we have to remove the second assumption
(Native is a subset of Internal). Where character literals are involved,
defining char escapes is viable since there aren't that many of them.
String literals are a bigger problem, though, because as I said, seeing
"\x64\x61\x76" instead of "dav" in the code is an instant turn-off.

It seems to me that if we strictly follow the string conversion rules we
already have in place (something we don't do, IIRC, at least in
mod_dav_svn and probably a few other places), everything _except_
handling of string literals would be solved in a satisfactory way (read:
mergeable-to-trunk).

For string literals, we want is a solution that

    * leaves readable string literals in the code;
    * allows static initialisation of struct members with string literals;
    * magics the literals to be in a UTF-8 subset at runfime.

By this time, the words "source pre-processor" should be ringing between
your ears. I propose a filter that converts string literals in source
files to ASCII-based char escapes before sending them to the compiler,
of course inserting appropriate #line directives so that debuggers still
show the original source. This filter could be inserted into the build
on all platforms where the "native" encoding isn't ASCII.

That would make the EBCDIC patch much smaller, and correct on all
platforms. As a bomus, if the filter were to recognise character
excapes, too, we could rely on those being ASCII at runtime, too, and
eliminate the character constant defines. the only remaining problem are
literals that don't come from Subversion, e.g., APR_EOL_STR; but we can
always define and use SVN_EOL_STR instead.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Feb 11 01:17:12 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.