On Fri, May 31, 2002 at 10:59:24PM +0100, Stephen C. Tweedie wrote:
> On Fri, May 31, 2002 at 02:25:41PM -0700, Greg Stein wrote:
>...
> > Yup. And that UCS-2 was part of my example. And on the Windows platform,
> > UCS-2 is the standard encoding for characters, so it isn't really all that
> > theoretical (well, once you get past the apparent NUL values in there and
> > being okay with casting wchar_t* to char* :-)
>
> This is now getting into "knows enough to be dangerous" territory.
> :-)
hehe... fair enough. I was referring to the BSTR type which is used quite
widely in COM interfaces (and thus, a lot of code), and is UCS-2 (possibly
UTF-16?). But the term "standard encoding" was almost definitely a bit,
umm... "off"? :-)
>...
> However, on disk, in simple notepad text documents or in emails or
> whatever, Windows is not necessarily using UCS-2. It's often using a
Yah... although, I will note that Notepad gives you an option to store in
UCS-2 :-) (and looking at my W2K box, it now appears they've expanded the
simple checkbox into choices for ANSI, Unicode (little/big-endian), and
UTF-8)
>...
> Pretty much the only advantage you get if you force all strings
> internally to UTF-8 is that when a client comes to translate one
> charset to another, it doesn't have to know anything about the
> encoding used by the original user when submitting the string in the
> first place. But then, it still has to know about that charset to
> display it, so that's really not much of a win.
Um. By using UTF-8, aren't we saying the charset is Unicode? So the fact
that it is in UTF-8 already tells you enough information to display it? Or
did I parse your sentence wrong?
Cheers,
-g
--
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Jun 1 14:10:04 2002