Re: UTF-8 (was: Re: property names)

From: Mo DeJong <mdejong_at_cygnus.com>
Date: 2000-12-22 00:37:06 CET

On Thu, 21 Dec 2000, Greg Hudson wrote:

> > I was thinking about internal representation, but yeah, they're
> > equivalent.
>
> I don't want to think about writing C code to handle two-byte-wide or
> four-byte-wide characters. C and Unix are very much geared towards
> one-byte characters, making UTF-8 a much more natural internal
> representation.

Forget about the C code, what about the memory? A 1000 byte file
requires 2000 bytes of memory in a unicode representation. If
each character required 32 bits or memory, a 1 meg file would
require 4 megs of system memory. That is just crazy!

Don't forget about the network transfer time either. Why would
anyone want to transfer 'a' as a 32 bit number over a network?

Using UTF-8 is a great solution since existing 8 bits
character sets require only 8 bits of system memory
to store them.

Mo DeJong
Red Hat Inc
Received on Sat Oct 21 14:36:18 2006

This message: [ Message body ]
Next message: Karl Fogel: "Greg Stein's several issues: proposed resolutions"
Previous message: Greg Hudson: "Re: CVS update: subversion/subversion/libsvn_subr io.c"
In reply to: Greg Hudson: "Re: UTF-8 (was: Re: property names)"
Next in thread: Greg Hudson: "Re: UTF-8 (was: Re: property names)"
Reply: Greg Hudson: "Re: UTF-8 (was: Re: property names)"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]