[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Character sets for log messages

From: Henrik Svensson <innotron_at_telia.com>
Date: 2002-06-02 00:25:16 CEST

citerar Nuutti Kotivuori <naked@iki.fi>:

> Colin Putney wrote:
> > On Saturday, June 1, 2002, at 08:08 AM, Henrik Svensson wrote:
> >> It is not very difficult to convert from unicode to any other
> >> charset.
> [...]
> >> In the case of a simple client that for example only can display 7
> >> bit ASCII it is even trivial, only remove the most significant byte
> >> from every character in the array.
> [...]
> > Well, just stripping off the high bit would leave garbage characters
> > wherever there are multibyte sequences, so you'd have to be able to
> > recognize those sequences and deal with them appropriately.
> Well, assuming that the 'unicode' above would mean an UTF-8 encoded
> string - and assuming Henrik meant that remove characters which have
> the most significant _bit_ set from the array.
> All multibyte sequences in UTF-8 consist of only characters with the
> most significant bit set. So there would be no garbage, just stripping
> of everything non-ASCII.
> -- Naked
That one way to do it. I described a faulty algorithm in my posting,
but thats not important at the moment. The important thing is that it
would be simple to make the current default subversion client and all
future clients able handle the scripts defined in the unicode standard
gracefully. At least I think that is a good thing.


To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jun 2 04:25:30 2002

This is an archived mail posted to the Subversion Dev mailing list.