Re: [RFC/PATCH] commit messages not 8-bit compatible

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-05-29 18:34:38 CEST

cmpilato@collab.net writes:
> The message already is being XML-encoded to some extent, in that '<'
> and '>' and other such special chars are being converted to entity
> representations, IIRC. I think all we need to do is to make sure that
> all this stuff is first converted to UTF-8, and then just add the
> "charset" XML attribute thingy that states that this particular XML
> document is in UTF-8.

So the idea is:

- First get the log message into UTF-8.

   - Then, our usual encoding step will convert `<', `>' and other
     special characters in the UTF-8 to entity representations, so
     that what goes across the wire is 7-bit and safe. (By "special",
     you meant "8-bit", right?) And of course it gets decoded back
     into UTF-8 on the other end.

Is that right?

It's step 1 that seems difficult to me. If the person didn't write
the log message in UTF-8 in the first place, how are we going to guess
what charset they _did_ write it in? It seems to me we have to add
new run-time config code, or heuristics, to determine what encoding it
uses, so that we can losslessly convert it to UTF-8 if it's not UTF-8
already.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:25:03 2002

This message: [ Message body ]
Next message: cmpilato_at_collab.net: "Re: Just say no to collections: moving wc meta-data out of the wc"
Previous message: Justin Erenkrantz: "Re: Apache compile problems (again)"
In reply to: cmpilato_at_collab.net: "Re: [RFC/PATCH] commit messages not 8-bit compatible"
Next in thread: Marcus Comstedt: "Re: [RFC/PATCH] commit messages not 8-bit compatible"
Reply: Marcus Comstedt: "Re: [RFC/PATCH] commit messages not 8-bit compatible"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]