[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

EBCDIC - was: (svn commit: r12029 - trunk/subversion/libsvn_client)

From: Mark Phippard <MarkP_at_softlanding.com>
Date: 2004-11-25 17:57:36 CET

> We have to decide, once and for all, if we support systems with an
> execution character set that isn't ASCII-based (i.e. ASCII is a subset).
> As brane points out, we have other places that use character constants.
> Unless you are going to fix all these, it makes no sense to say 65
instead
> of 'A'. If we want to support EBCDIC and such, we should define symbolic
> constants instead of spreading these numbers all over the code.

We have been working on a port to OS/400, an EBCDIC system. I am not sure
if there are any other commonly used EBCDIC systems left? So when you talk
about EBCDIC, I think OS/400 might be the only one left that is relevant.

Anyway, our approach has basically been to make Subversion "tolerant" of
the EBCDIC system. We leave Subversion itself as largely an ASCII/UTF-8
system, after all, that is what the network traffic has to be encoded with.
The issues when dealing with an EBCDIC system are as follows, with how we
solved the problem:

1) Command line console input is EBCDIC. - We immediately convert it to
UTF-8.

2) In our case, APR is provided with OS/400, and is EBCDIC based. So all
OS calls and all APR calls, require EBCDIC. For example, to open a file,
the path anf filename have to be passed in EBCDIC. To solve this, we added
a "wrapper" layer where we convert the necessary string to/from EBCDIC. We
link Subversion against our wrapper, and the wrapper then calls APR or
system.

3) Literals in source code are compiled as EBCDIC. If comparing against a
UTF-8 value, it will fail. This is solved, at least on OS/400, by adding a
#pragma directive in the source to tell the compiler that the literals
should be treated as UTF-8. For the most part, we just globally declare
these pragmas, and in the rare instance we need the literal in EBCDIC, we
add another #pragma to set it back.

4) We are using fsfs for repository. SQL will eventually be of interest
since OS/400 includes DB2. Our repository is all UTF-8 and fully
compatible with repository on other OS. We reguarly move files to Windows
and back to test stuff.

Using these basic techniques, we have svnserve fully ported and working.
Passes all tests. The command line client also works, although we have
not ported neon yet, as we really do not have any interest in running a
client on OS/400. We will do it eventually though, it just means we will
also have to port OpenSSL etc...

We are currently working on mod_dav_svn and that is proving more difficult
because the Apache server provided by IBM converts everything to EBCDIC
before it hands it to the modules. Also, there is a lot more back and
forth between the "layers" in Apache then when dealing with the command
line or svnserve. It is proving more challenging to solve this using a
"wrapper", but we will eventually get it all working. IBM also provides
mod_dav with Apache, but we are finding it easier to do our own port of
mod_dav so we can tackle a lot of the issues in that layer and keep the
Subversion layer relatively clean.

I have been meaning to ask, do you think we could get committter access to
an "OS400 or "EBCDIC" branch at some point? We intend to contribute back
and support this port and it would be easier to do it in the main
repository. Currently, I am using svk to mirror the 1.1.x branch to a
local repository where we can do our work and stay in synch. We would
probably continue as we are doing currently until we get the mod_dav_svn
piece working. But then we would like to clean things up a bit and get in
a branch where it can be reviewed. 95% of the patch is just in new
"wrapper" layer that would not be built on other platforms. In the
Subversion source code, there is generally just a small bit added at in the
#include's. Something like:

#ifdef APR_CHARSET_EBCDIC
#pragma ccsid(1208) /* UTF-8 */
#include "apr_wrap.h"
#endif

Other than that, there are just a handful of places where we had to insert
stuff in the code itself. In that case it is bracketed off with either
#ifdef APR_CHARSET_EBCDIC, or #ifdef AS400 if we thought it was specific to
OS/400. For example, there are a couple of APR functions that IBM changed
the signature on for some reason.

The mod_dav_svn layer will definitely have a bit more code in it as the
wrappering is more complicated.

Anyway, hopefully that sheds some light on how one group decided to deal
with the EBCDIC issue. I really cannot see any other way to deal with all
of the issues involved.

Thanks

Mark

_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. by IBM Email Security Management Services powered by MessageLabs.
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Nov 25 17:58:46 2004

This is an archived mail posted to the Subversion Dev mailing list.