> Julian Reschke <julian.reschke@gmx.de> writes
> > Vincent Lefevre wrote:
> > > ...
> > > Some languages can be encoded in UTF-16, which is somewhat
> > > compatible with UTF-8, so no problem here.
> >
> > UTF-16 and UTF-8 are both *encodings* of the same character set
> > (Unicode) and therefore can encode the exact same set of languages.
>
> I think Vincent knows this, and that what he means by
> "somewhat compatible" is that for some strings in some
> languages, the UTF-8 encoding and the UTF-16 encoding will be
> exactly the same sequence of bytes.
This will almost never be the case. In fact I'd love to see an
example of a sentence where UTF-8 and UTF-16 provide the same
encoding. UTF-8 is a variable width encoding, whilst UTF-16 is
fixed width.
For most western characters (i.e. the ASCII set) the UTF-8 and
UTF-16 will never be the same as UTF-8 never has any \x0 characters
and UTF-16 will have a \x0 character associated with every character.
- Dale.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 24 17:00:36 2003