[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 (was: Re: property names)

From: Mo DeJong <mdejong_at_cygnus.com>
Date: 2000-12-22 00:37:06 CET

On Thu, 21 Dec 2000, Greg Hudson wrote:

> > I was thinking about internal representation, but yeah, they're
> > equivalent.
>
> I don't want to think about writing C code to handle two-byte-wide or
> four-byte-wide characters. C and Unix are very much geared towards
> one-byte characters, making UTF-8 a much more natural internal
> representation.

Forget about the C code, what about the memory? A 1000 byte file
requires 2000 bytes of memory in a unicode representation. If
each character required 32 bits or memory, a 1 meg file would
require 4 megs of system memory. That is just crazy!

Don't forget about the network transfer time either. Why would
anyone want to transfer 'a' as a 32 bit number over a network?

Using UTF-8 is a great solution since existing 8 bits
character sets require only 8 bits of system memory
to store them.

Mo DeJong
Red Hat Inc
Received on Sat Oct 21 14:36:18 2006

This is an archived mail posted to the Subversion Dev mailing list.