Philip Martin <philip@codematters.co.uk> writes:
> OK, you're crazy ;)
>
> The UTF-8 encoding has the property that multibyte encodings consist
> solely of bytes with the high bit set, i.e non-ASCII values greater
> than 0x80. Provided the log message is valid UTF-8 (something we
> already check) then any ASCII control characters must be single byte
> characters.
>
> See the header in utf_validate.c for the gory details.
Oh, no, I understand how UTF8 works. If we're already checking that
log messages are valid UTF8, then that just means my condition is
already met. I'm not crazy, I'm just behind the times :-).
The problem is that we're not doing that check before we send a log
message from server to client. We should, and if the string is not
UTF8... then what? Escape any funny chars? I thought we had code for
doing this, too, already written for some other purpose...
/me rummages around
Aha, I'm thinking of svn_utf_cstring_from_utf8_fuzzy(), which is not
quite the same thing. That function converts a UTF8 string with
multibyte characters into another UTF8 string with only single-byte
characters (that is, an ASCII string, with the old multibyte chars
represented using special 7-bit escape codes).
Whereas the function we're talking about would convert a mixed
UTF8/non-UTF8 string to a purely UTF8-string, converting the non-UTF8
characters to (presumably) some sort of escape codes, probably the
same kind as svn_utf_cstring_from_utf8_fuzzy() uses.
*Now* am I sounding crazy? :-)
-Karl
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 29 23:25:49 2004