[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: UTF-8 problem: non-UTF-8 in a UTF-8 locale

From: Florian Weimer <fw_at_deneb.enyo.de>
Date: 2004-02-06 07:15:20 CET

Philip Martin wrote:

> UTF-8 is defined by RFC2279, but it appears the GNU iconv uses the
> more restrictive rules defined by Unicode, such as found in section
> 3.9 of http://www.unicode.org/versions/Unicode4.0.0/bookmarks.html

RFC 2279 has been superseded by RFC 3629, which contains basically the
same rules.

I agree that such checks are necessary to prevent repository corruption.
I'm not sure if your checks are sufficient, though; do you handle
surrogate pairs and other invalid UTF-8 sequences, too (apart from
overlong UTF-8 sequences)?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Feb 6 07:15:52 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.