[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] #6 OS400/EBCDIC Port: Prevent OS conversion of file contents

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2006-02-22 19:50:32 CET

Paul Burba wrote:
>
> This obviously is improperly indented code, but is something like this
> ever acceptable considering the alternative would result in redundant code
> and generally be more intrusive? [...]

I'm not much worried about that indentation.

Having read through this patch a few times, I feel I'm not getting the whole
story about why this patch seems to be such an ugly work-around, creating the
file in one mode, closing it and then re-opening it another mode. I'll
describe how I see it. Correct me when I go wrong.

> ----------------------------------------------------------------------
> [[[
> Every file in OS400 is tagged with a CCSID which represents its
> character encoding. On OS400 V5R4 when apr_file_open() creates a
> file, its CCSID varies depending if the APR_BINARY flag is passed or
> not. If APR_BINARY is passed, the file is created with CCSID 37
> (EBCDIC), if not, it has CCSID 1209 (UTF-8).

1208 not 1209?

So this "UTF-8 on EBCDIC system" version of APR has decided that "text" files
have UTF-8 content within the application and thus should be stored as UTF-8,
tagged as UTF-8.

The tag for a text file must be correct otherwise other applications - text
editors, etc. - would read the file as garbage, even though the originating
application might be able to read it by ignoring the tag.

Applications on OS400 generally need a way to read and write EBCDIC files
(correctly tagged). APR decides that the "APR_BINARY" flag is going to select
EBCDIC mode rather than the more logical "binary" or "unknown" content
encoding. Oops. Now what is an application supposed to do that wants to write
files that are neither UTF-8 nor EBCDIC?

In APR_BINARY mode, this APR translates between EBCDIC on disk and UTF-8 on the
application side when reading and writing, so we can't use this mode for binary
files. No, I must have gone wrong already; that would be too silly. APR must
behave differently on file create from how it behaves on other file operations.

Please tell me more.

> Since subversion creates files with either binary or UTF-8 content and
> all calls to apr_file_open() in subversion use APR_BINARY, these files
> are incorrectly tagged.

So the solution is to tag all new files as UTF-8 (achieved by creating them
without APR_BINARY), and then reading/writing them with APR_BINARY. That will
work for both UTF-8 and binary/unknown files because the encoding tag is
correct for text files and irrelevant for binary/unknown files, and no
translation will be done, yes? No, that doesn't make sense. APR_BINARY means
EBCDIC on disk, and therefore translation during read/write, doesn't it?

Aargh!

Is this all just a bug in APR?

> Simply not using APR_BINARY on OS400 when opening a file isn't an
> option, because in this case the OS attempts to convert the file's
> contents from its CCSID to UTF-8 when reading the file and vice-versa
> when writing to it. This has obvious problems if the file contains
> binary data.
>
> This patch ensures files *created* via svn_io_file_open() and
> svn_io_open_unique_file2() are tagged with a CCSID of 1208.

> +/* Helper function for apr_file_open() on OS400.
> + *
> + * When calling apr_file_open() with APR_BINARY and APR_CREATE on OS400
> + * the new file has an ebcdic CCSID (e.g. 37). But the files created by

Did you mean "37" or "i.e. 37" instead of "e.g. 37"?

> + /* Whether or not APR_EXCL is set or not, we want to unset it before the

Too many "or not"s.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Feb 22 19:51:48 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.