[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Character sets for log messages

From: Colin Putney <cputney_at_whistler.net>
Date: 2002-06-01 19:39:46 CEST

On Saturday, June 1, 2002, at 08:08 AM, Henrik Svensson wrote:

> It is not very difficult to convert from unicode to any other charset.
> Code and recommendations how to do it (for most standard charsets) are
> available. Some systems even have functions that will do it for you. In
> the case of a simple client that for example only can display 7 bit
> ASCII it is even trivial, only remove the most significant byte from
> every character in the array. The only thing the client has to consider
> is that the text can contain characters that it can't convert and
> print. How to handle this case has to be decided by the client
> developer, but an easy solution is to replace the unprintable
> characters with a simple placehoder.

Well, just stripping off the high bit would leave garbage characters
wherever there are multibyte sequences, so you'd have to be able to
recognize those sequences and deal with them appropriately.

I realize that it's not impossible, or even difficult to do the
conversion. But it will take time and effort to do the research, coding,
testing and maintenance. It's another hurdle that a client developer
will have to clear, for no particular benefit.

What will likely happen is that simple clients will just ignore the
problem, like CVS does. Then everything will work fine as long as the
user never encounters anything but 7-bit ASCII, which maps to UTF-8
without modification.

By explicitly specifying the charset we give clients the option to
gracefully decline to display charsets they don't know about.

Colin Putney
Whistler.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 19:40:06 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.