[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC/PATCH] commit messages not 8-bit compatible

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-05-30 23:53:47 CEST

Greg Stein <gstein@lyra.org> writes:
> Nope. We said that text strings passed around within the libraries (log
> message is a good one, paths, property names, etc) would be considered to be
> in UTF-8. We chose that following the same reasoning as using UTF-8 for the
> pathnames: consistency and that it can represent any other character set.

Oh, okay -- what we have here is different memory about what was
agreed on in the past. So, let's never mind what we *thought* was
agreed on, since it's clear what various people think right now :-).

I remember (& agree with) paths and property names. I never thought
the decision covered anything more than that. If I had realized, I
would have said something sooner.

Right now I mildly prefer this solution:

   - Don't munge (or convert, to use a less pejorative term) the log
     message at all, but simply reject log messages that contain any
     zero bytes. Log message charsets would be determined by each
     individual repository's policy, with a recommendation (but not an
     enforcement) from us to use UTF-8.

If a lot of people feel strongly that enforcing conversion to UTF-8 is
the Right Thing, I certainly won't veto. I mean, I could be wrong :-).

How reliable it is to use locale to determine the source format of the
conversion (or whatever method we're going to use), though? For
example, my locale indicates nothing about Chinese editing, but
sometimes I write text in one of the various char encodings that
supports Chinese characters. If I were to do that in a log message on
some project, my log message would get all messed up. In such a case,
leaving it alone would be better, because some tools that can
heuristically determine the charset -- *if* they have the original
data to work with. If the data is there, one can guess at the charset
if necessary. If the data is destroyed by a misconversion, then it's
gone. That's why I feel it's better to leave it alone.

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:15:53 2002

This is an archived mail posted to the Subversion Dev mailing list.