[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: EBCDIC - was: (svn commit: r12029 - trunk/subversion/libsvn_client)

From: Mark Phippard <MarkP_at_softlanding.com>
Date: 2004-11-27 22:25:03 CET

Branko ╚ibej <brane@xbc.nu> wrote on 11/26/2004 06:00:23 AM:

> Mark Phippard wrote:
>
> >1) Command line console input is EBCDIC. - We immediately convert it
to
> >UTF-8.
> >
> >
> Subversion already does this itself if APR knows it's on an EBCDIC
system.

We tried this route first. We had some success although there were a
number of places where we had to workaround ASCII "assumptions". We
started with svnadmin, and then the command line. We then decided that we
really only cared about having a server, so we switched to svnserve. We
thought that would be "easier" because in our case Apache and mod_dav come
from IBM and we do not have the source. We didn't want to bang our heads
against a wall when the problem was really in their port. Anyway, we kind
of hit a wall with svnserve because the socket listener is of course
getting UTF-8 and it was hard to know how to handle that for things like
parsing the stream. That is when we decided to toss it all, and that it
would be easier to let Subversion think it was in a UTF-8 world and just
wrapper the system calls. Using that approach, we got svnadmin, svnserve
and svn all working relatively easily.

All that being said, your message has inspired us to take a second look.
We know so much more about the svn code now, as well as the general EBCDIC
issues that it is worth another look. We are going to start with
mod_dav_svn this time since that is what we want to work the most. Also,
in some ways it is "cleaner" because IBM converts everything to EBCDIC
before we see it so the environment is more consistent with the console.
If we get it working, hopefully we can just figure out a way to deal with
svnserve. I assume when we run into ASCII "assumptions" we can somewhat
treat it like an opportunity where we can improve the code for all
platforms? If the assumption is a safe one on an ASCII platform, I assume
we should condition any UTF-8 to locale conversions so as to not add any
extra performance overhead for ASCII platforms?
>
> >2) In our case, APR is provided with OS/400, and is EBCDIC based. So
all
> >OS calls and all APR calls, require EBCDIC. For example, to open a
file,
> >the path anf filename have to be passed in EBCDIC. To solve this, we
added
> >a "wrapper" layer where we convert the necessary string to/from EBCDIC.
 We
> >link Subversion against our wrapper, and the wrapper then calls APR or
> >system.
> >
> >
> Subversion does this, too. Are you actually writing an _addional_
> wrapper? Why doesn't Subversion's translation code already work for you?

> I know there are some ommissions in it, but in general, it should work.

We turned your conversion functions into no-ops and then just do all of
the conversions around the system calls in our wrapper. That way we do
not have problems when there is an "omission".

> >3) Literals in source code are compiled as EBCDIC. If comparing
against a
> >UTF-8 value, it will fail. This is solved, at least on OS/400, by
adding a
> >#pragma directive in the source to tell the compiler that the literals
> >should be treated as UTF-8. For the most part, we just globally
declare
> >these pragmas, and in the rare instance we need the literal in EBCDIC,
we
> >add another #pragma to set it back.
> >
> >
> Yes, you noticed that SVN sometimes converts string literals to UTF-8
> itself, and sometimes it doesn't. This is the real problem, and I don't
> know how to solve it (portably) without converting _every_ literal,
> which would be a bit of a pain. Of course it would be nice if SVN could
> expect every literal to be in UTF-8 (we really only care about the ASCII

> subset anyway).

The #pragma approach on OS/400 at least makes it manageable.

Mark

_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. by IBM Email Security Management Services powered by MessageLabs.
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Nov 27 22:26:22 2004

This is an archived mail posted to the Subversion Dev mailing list.