[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] First cut at 1954 solution

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2004-12-08 03:14:21 CET

kfogel@collab.net writes:

> UTF-8 is a multibyte encoding. I'm not sure how familiar you are with
> it, but http://svn.red-bean.com/repos/main/3bits/utf8_xml.txt
> summarizes how it works. I'm posting it here because I've been hoping
> Brane or someone would proofread it anyway :-).

The document is out of date. It claims UTF-16 is the simplest
encoding and that it's fixed-width, but these days Unicode code points
go up to 0x10FFFF and so UTF-16 is now strictly a multi-byte encoding.
UTF-32 is probably the simplest, fixed-width encoding. However since
the Unicode standard itself claims that nearly all characters in all
modern languages are below 0xFFFF, treating UTF-16 as fixed-width is
likely to work in most cases.

Philip Martin
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Dec 8 03:15:44 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.