[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Call For Votes: converting log messages to UTF-8

From: Marcus Comstedt <marcus_at_mc.pp.se>
Date: 2002-06-01 02:25:29 CEST

Greg Hudson <ghudson@MIT.EDU> writes:

> On Fri, 2002-05-31 at 13:38, Greg Stein wrote:
> > Converting from charset FOO to UTF-8 is a specific translation. No data
> > loss. Converting from UTF-8 back to FOO is a perfect restoration.
>
> Hm, is this always true?
>
> For instance, a Shift-JIS document could have redundant shift octets.

You're thinking about ISO-2022. Shift-JIS is stateless; the name
"shift" comes from the fact that the character codes are shifted round
a bit compared to their codepoints in the original JIS standards, not
from it using shift modes.

For ISO-2022 though, you might end up with another octet sequence than
what you originally had. Unless what you originally had was
canonicalized in some manner, it is in fact quite likely that you
will. ISO-2022 is rather messy in that the same sequence of
characters can be represented in numerous ways.

  // Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:09:28 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.