[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Call For Votes: converting log messages to UTF-8

From: Greg Stein <gstein_at_lyra.org>
Date: 2002-05-31 19:38:25 CEST

On Fri, May 31, 2002 at 10:55:24AM -0500, Ben Collins-Sussman wrote:
>...
> The two behaviors, in my mind, boil down to a matter of choosing a
> risk:
>
> 1. do we risk munging userdata at *input* time, by attempting to
> guess at a charset to convert to UTF-8?

You're wrong Ben. And this shows part of the problem of this whole
conversation. "oh no... wah wah... it is going to corrupt my data."

Bunk.

Converting from charset FOO to UTF-8 is a specific translation. No data
loss. Converting from UTF-8 back to FOO is a perfect restoration. Two other
situations:

1) if you convert back to BAR, then yes: it won't appear properly.

2) if you convert FOO characters, thinking they were BAR, then it will
   certainly be "funky", but you still won't have data loss -- convert back
   as if you had BAR.

So. Option 1 is riskless in terms of data loss.

[ per (2) you could end up with incorrect unicode characters, but you can
  get the original back and reencode properly ]

> OR
>
> 2. do we risk munging userdata at *output* time, i.e. not knowing
> how to display the logmsg properly, because we don't know its
> charset?

And here is your risk.

Jon Trowbridge, who is doing GNOME work, and is familiar with the situation
has said it several times: you'll have an unknown and unknowable charset.
Not a great situation.

>...
> risky. Given that we support both behaviors via some kind of
> ~/.subversion/ config option, I think the sensible default is not to
> munge data at input time. If users want to flip a switch and force
> all log messages into UTF-8, that's totally fine. But I think a
> decision that the *user* must make, not one for our client-app to make
> right out of the box.

As Garrett points out, if tools cannot *know* the log message is UTF-8, then
the whole option and the encoding and everything is bogus. Either you
enforce UTF-8 or you give up the whole ball of wax.

And if you give it up, you fall into the #2 risk category.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:11:49 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.