[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Call For Votes: converting log messages to UTF-8

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-05-31 19:38:25 CEST

On Fri, May 31, 2002 at 10:55:24AM -0500, Ben Collins-Sussman wrote:
> The two behaviors, in my mind, boil down to a matter of choosing a
> risk:
> 1. do we risk munging userdata at *input* time, by attempting to
> guess at a charset to convert to UTF-8?

You're wrong Ben. And this shows part of the problem of this whole
conversation. "oh no... wah wah... it is going to corrupt my data."


Converting from charset FOO to UTF-8 is a specific translation. No data
loss. Converting from UTF-8 back to FOO is a perfect restoration. Two other

1) if you convert back to BAR, then yes: it won't appear properly.

2) if you convert FOO characters, thinking they were BAR, then it will
   certainly be "funky", but you still won't have data loss -- convert back
   as if you had BAR.

So. Option 1 is riskless in terms of data loss.

[ per (2) you could end up with incorrect unicode characters, but you can
  get the original back and reencode properly ]

> OR
> 2. do we risk munging userdata at *output* time, i.e. not knowing
> how to display the logmsg properly, because we don't know its
> charset?

And here is your risk.

Jon Trowbridge, who is doing GNOME work, and is familiar with the situation
has said it several times: you'll have an unknown and unknowable charset.
Not a great situation.

> risky. Given that we support both behaviors via some kind of
> ~/.subversion/ config option, I think the sensible default is not to
> munge data at input time. If users want to flip a switch and force
> all log messages into UTF-8, that's totally fine. But I think a
> decision that the *user* must make, not one for our client-app to make
> right out of the box.

As Garrett points out, if tools cannot *know* the log message is UTF-8, then
the whole option and the encoding and everything is bogus. Either you
enforce UTF-8 or you give up the whole ball of wax.

And if you give it up, you fall into the #2 risk category.


Greg Stein, http://www.lyra.org/
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:11:49 2002

This is an archived mail posted to the Subversion Dev mailing list.