[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: use of UTF-8

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2002-05-31 02:55:19 CEST

On Thu, 2002-05-30 at 18:45, Greg Stein wrote:
> 3) untenable for the clients.

I'd like to keep a little perspective here.

If we don't solve the log message character set problem, then projects
are happy as long as:

  * They are willing to stick with ASCII log messages, or
  * All their developers use the same character set, or
  * All their developers have use a UTF-8 native locale

(That third statement is a little forward-looking, but there has been
some progress in that direction.)

I believe this covers quite a lot of users--everyone who is happy with
CVS, for instance. Subversion is not going to fail on account of not
doing character set conversion.

This is why I would be happy being charset-neutral and 8-bit clean (not
necessarily binary-clean) for all text fields. Possibly happier, since
we would never be responsible for misconverting text when LC_CTYPE isn't
set properly, or anything like that. Plus our code would be simpler.

On the other hand, there seems to be a fairly broad consensus for doing
UTF-8/$LC_CTYPE character set conversion for filenames. I am...
confused as to why anyone advocates converting filenames and not log
messages, since they are both text. File contents are binary data.
Property values... might be binary data; that seems to be the conensus
for now, anyway, although that leads to questions about how svn:ignore
should be interpreted and such. But log messages are definitely text.

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:14:59 2002

This is an archived mail posted to the Subversion Dev mailing list.