[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC/PATCH] commit messages not 8-bit compatible

From: Marcus Comstedt <marcus_at_mc.pp.se>
Date: 2002-05-29 18:39:40 CEST

Karl Fogel <kfogel@newton.ch.collab.net> writes:

> So the idea is:
>
> - First get the log message into UTF-8.
>
> - Then, our usual encoding step will convert `<', `>' and other
> special characters in the UTF-8 to entity representations, so
> that what goes across the wire is 7-bit and safe. (By "special",
> you meant "8-bit", right?) And of course it gets decoded back
> into UTF-8 on the other end.

HTTP is 8-bit safe. No need for 7-bit oddities. The special
characters in XML are `<', `>', `&', and in the case of attribute
values `'' and `"'. No other octets need special treatment.

> Is that right?
>
> It's step 1 that seems difficult to me. If the person didn't write
> the log message in UTF-8 in the first place, how are we going to guess
> what charset they _did_ write it in? It seems to me we have to add
> new run-time config code, or heuristics, to determine what encoding it
> uses, so that we can losslessly convert it to UTF-8 if it's not UTF-8
> already.

And that's precisely what I've been working on for a couple of days
now. See the "UTF-8" thread. I intend to post a new update of the
patch tomorrow.

  // Marcus

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:24:56 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.