[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Character sets for log messages

From: Nuutti Kotivuori <naked_at_iki.fi>
Date: 2002-06-01 23:55:12 CEST

Colin Putney wrote:
> On Saturday, June 1, 2002, at 08:08 AM, Henrik Svensson wrote:
>> It is not very difficult to convert from unicode to any other
>> charset.

[...]

>> In the case of a simple client that for example only can display 7
>> bit ASCII it is even trivial, only remove the most significant byte
>> from every character in the array.

[...]

> Well, just stripping off the high bit would leave garbage characters
> wherever there are multibyte sequences, so you'd have to be able to
> recognize those sequences and deal with them appropriately.

Well, assuming that the 'unicode' above would mean an UTF-8 encoded
string - and assuming Henrik meant that remove characters which have
the most significant _bit_ set from the array.

All multibyte sequences in UTF-8 consist of only characters with the
most significant bit set. So there would be no garbage, just stripping
of everything non-ASCII.

-- Naked

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jun 2 03:56:54 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.