[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Call For Votes: converting log messages to UTF-8

From: Garrett Rooney <rooneg_at_electricjellyfish.net>
Date: 2002-05-31 18:53:21 CEST

On Fri, May 31, 2002 at 10:06:52AM -0500, Karl Fogel wrote:
> It seems to me that everyone's pretty much stated their reasons for
> and against now. We're no longer adding new material to the
> discussion, we're just reiterating points already made.
>
> So, I'd like to propose a vote.
>
> I hope we all agree that we're just choosing a default behavior for
> the client here -- users can get the alternate behavior by setting or
> unsetting a config option in ~/.subversion/options. I.e., we should
> offer conversion to UTF-8 for those who want it, and should not
> unconditionally *force* conversion to UTF-8 for those who know they
> don't want it. The only question is how we behave out-of-the-box.
>
> (If this is controversial, I guess we're not ready to vote yet.)
>
> The two choices are
>
> [ ] By default, recode log messages from user input to UTF-8, using
> the locale to get a best guess for the original encoding of the
> user input.
>
> [ ] By default, do no re-encoding of log messages. Store exactly
> the byte sequence the user enters. When printing log messages,
> the svn client would simply assume that the byte '\n' is a line
> end (it prints out the number of lines in each message as part
> of the msg header). When printing out the log message as xml,
> we'd do our best to escape bytes that are incompatible with
> being xml content; this probably implies treating the message
> as Latin-1 or something, but I haven't thought carefully about
> that.

if we're going to have this be changeable with a client side config option,
then i say we might as well not do any re-encoding at all. having
utf-8 as the charset of the log messages is only helpful (in my
opinion) if you can ALWAYS count on it being in that charset.

so i think i'm leaning towards not re-encoding, with the provision
that we include some means of indicating the character set (and
possibly mime type if we're talking about allowing binary data as a
log entry). without this information, i can't see how you can robustly
display log entries from the client. the fact that cvs allows
whatever the hell you want in your log and doesn't have a means to
figure out what it is just proves that cvs sucks in this particular
area, which makes it tough to write a good client for it.

-garrett

-- 
garrett rooney                    Remember, any design flaw you're 
rooneg@electricjellyfish.net      sufficiently snide about becomes  
http://electricjellyfish.net/     a feature.       -- Dan Sugalski
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:11:56 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.