[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] First cut at 1954 solution

From: Greg Stein <gstein_at_lyra.org>
Date: 2004-12-08 07:41:47 CET

On Wed, Dec 08, 2004 at 12:27:17AM -0600, kfogel@collab.net wrote:
> VK Sameer <sameer@collab.net> writes:
> > On Wed, 2004-12-08 at 03:26, Branko ??ibej wrote:
> > > Hmmm. "Invalid byte in UTF-8 sequence" would be closer to the mark;
> > > there's no such thing as an UTF-8 character.
> >
> > RFC3629 (http://www.ietf.org/rfc/rfc3629.txt) uses that phrase:
> >
> > "Decoding a UTF-8 character proceeds as follows:"
> Yeah, I also wondered what Brane meant by the assertion that there is
> no such thing as a UTF-8 character :-). I assumed the definition was
> the obvious one:
> A series of 1 to 4 bytes that represents a Unicode code point,
> using the encoding described by the UTF-8 specification.

UTF-8 is an encoding rather than a character set. Thus, it does not
define any characters. *Unicode* is a character set, and UTF-8 is a
particularing encoding of that charset. Therefore, Brane is right:
UTF-8 does not define any characters -- Unicode does.

Branko's suggested text is "closer to reality". There is an error in
the encoding, rather than an erroneous character.

All that said: given that UTF-8 only applies to the Unicode charset,
there is a very fine line there. But to be pedantic... :-)


Greg Stein, http://www.lyra.org/
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Dec 8 07:41:30 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.