[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 (was: Re: property names)

From: Mo DeJong <mdejong_at_cygnus.com>
Date: 2000-12-22 00:56:55 CET

> Just a terminology clarification, here (this is how it was explained
> to me, but I Am Not An Expert):
>
> "Unicode" is a system that maps characters<-->numbers, independently
> of how those numbers are represented.
>
> "UTF-*" are systems for mapping numbers<-->bitrepresentations, of
> which UTF-8 is probably the most efficient for our purposes.
>
> In other words, there is no limit on the size of the Unicode character
> set, but every time they add characters past a certain boundary, the
> UTF encodings need to be updated so people know how to encode the new
> ranges.
>
> So Mo, you're complaining not about "unicode" per se, but about its
> UTF-16 and UTF-32 encodings, and I quite agree with you. :-)

Sorry I was not more clear about that. I think of "unicode" as
the UTF-16 encoding (Java on the brain). You can of course
represent the same "numbers" using the UTF-8 encoding. I still
say using UTF-16 internally will turn the svn lib into a memory
hog.

Mo DeJong
Red Hat Inc
Received on Sat Oct 21 14:36:18 2006

This is an archived mail posted to the Subversion Dev mailing list.