[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Character sets for log messages

From: Colin Putney <cputney_at_whistler.net>
Date: 2002-06-03 04:34:13 CEST

On Sunday, June 2, 2002, at 04:11 PM, Jon Trowbridge wrote:

> On Sat, 2002-06-01 at 12:39, Colin Putney wrote:
>>
>> I realize that it's not impossible, or even difficult to do the
>> conversion. But it will take time and effort to do the research,
>> coding,
>> testing and maintenance. It's another hurdle that a client developer
>> will have to clear, for no particular benefit.
>
> Keep in mind that good[1] free[2] code for dealing with UTF-8 and
> converting between charsets already exists, and it is being extensively
> used by other projects. So the barriers for free software/open
> source/whatever clients are really pretty low.

I'd written a long and involved defense of my statement above before I
realized the whole thing is a red herring. Suffice to say that (1) such
libraries may not always be available because of of licensing or
technical issues (not everyone codes in C), and (2) even if you do have
a library to do the conversion for you, it's still additional
unnecessary complexity in the client.

The real issue is that UTF-8 is only ideal for English-speaking users.
Speakers of other languages are really going to want to use a text
encoding that supports their character set more efficiently. This
becomes even more true when you get into the multilingual scenario.

I think Stephen was correct when he proposed email as the model for
Subversion to follow. It allows for the richest and most efficient
handling of text, while degrading nicely for clients and external tools
that don't handle multiple encodings.

At the same time, it means Subversion will never corrupt users' data
because it doesn't process it. If the wrong encoding is associated with
a particular swath of text, it may not get displayed properly, but since
the original bytes are preserved, you can just find the right encoding.

Cheers,

Colin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Jun 3 04:34:45 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.