Mo DeJong <email@example.com> writes:
> Forget about the C code, what about the memory? A 1000 byte file
> requires 2000 bytes of memory in a unicode representation. If
> each character required 32 bits or memory, a 1 meg file would
> require 4 megs of system memory. That is just crazy!
> Don't forget about the network transfer time either. Why would
> anyone want to transfer 'a' as a 32 bit number over a network?
> Using UTF-8 is a great solution since existing 8 bits
> character sets require only 8 bits of system memory
> to store them.
Just a terminology clarification, here (this is how it was explained
to me, but I Am Not An Expert):
"Unicode" is a system that maps characters<-->numbers, independently
of how those numbers are represented.
"UTF-*" are systems for mapping numbers<-->bitrepresentations, of
which UTF-8 is probably the most efficient for our purposes.
In other words, there is no limit on the size of the Unicode character
set, but every time they add characters past a certain boundary, the
UTF encodings need to be updated so people know how to encode the new
So Mo, you're complaining not about "unicode" per se, but about its
UTF-16 and UTF-32 encodings, and I quite agree with you. :-)
Received on Sat Oct 21 14:36:18 2006