Jesper Steen Møller <jesper@selskabet.org> wrote on 04/09/2006 03:40:22
PM:
> L mice wrote:
>
> > svn readin the auth config file as "rt" mode. and cant deal
> > with unicode files correctly, unicode files have 2 bytes "FF FE"
> > before content. and svn just read as bytes one by one ,and I think svn
> > just deal with ansi charset file.
>
> Well, yes - SVN reads the file as a byte-oriented stream, without paying
> attention to what the non-ascii characters represent. Some of the
> strings go into internal memory structures that are internally parsed as
> UTF-8.
>
> > // Jesper Steen Møller
> > > In summary - yes, there is a bug, but IMHO the bug is that config
> > files are assumed to be UTF-8.
> >
> > I think svn assume any thing are bytes.
> >
> Yes, they are bytes but they are stored (without any checking or
> conversion) in internal strucures where they are later assumed to be
UTF-8.
>
> How is this handled for AS/400, I wonder. Mark?
Jesper,
The OS400 port assumes that all config files are encoded in UTF-8. The
reads of these files in the port have always been in binary mode (the "rt"
mode to svn_config__open_file() is "rb" in the port).
Paul B.
> -Jesper
>
> P.S.: For charset nerds like myself:
> http://www-950.ibm.com/software/globalization/icu/demo/converters?s=ALL
_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs.
_____________________________________________________________________________
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Apr 10 16:08:40 2006