[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Control characters in log message cause failure

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2004-11-30 00:25:03 CET

kfogel@collab.net writes:

> Oh, no, I understand how UTF8 works. If we're already checking that
> log messages are valid UTF8, then that just means my condition is
> already met. I'm not crazy, I'm just behind the times :-).
>
> The problem is that we're not doing that check before we send a log
> message from server to client.

Which check?

> We should, and if the string is not
> UTF8... then what?

We already fail if the message is not valid UTF-8:

$ LANG=en_GB.UTF-8 svn commit -m `printf "\xe5"`
../svn/subversion/libsvn_client/commit.c:775: (apr_err=22)
svn: Commit failed (details follow):
../svn/subversion/libsvn_subr/utf.c:457: (apr_err=22)
svn: Valid UTF-8 data
(hex:)
followed by invalid UTF-8 sequence
(hex: e5)

> Escape any funny chars? I thought we had code for
> doing this, too, already written for some other purpose...
>
> /me rummages around
>
> Aha, I'm thinking of svn_utf_cstring_from_utf8_fuzzy(), which is not
> quite the same thing. That function converts a UTF8 string with
> multibyte characters into another UTF8 string with only single-byte
> characters (that is, an ASCII string, with the old multibyte chars
> represented using special 7-bit escape codes).
>
> Whereas the function we're talking about would convert a mixed
> UTF8/non-UTF8 string to a purely UTF8-string, converting the non-UTF8
> characters to (presumably) some sort of escape codes, probably the
> same kind as svn_utf_cstring_from_utf8_fuzzy() uses.
>
> *Now* am I sounding crazy? :-)

One of us is confused, or perhaps is just terminology.

There is no "mixed UTF8/non-UTF8 string" and there are no "non-UTF8"
characters that need to be "converted". There may be ASCII control
codes in the log message, and if these are not valid XML then they
need to be rejected or escaped, but the only place that UTF-8 comes in
is that ASCII control codes are encoded unchanged in UTF-8.

It looks like we have the same problem with paths in the entries file:

$ svn mkdir wc/`printf "\x18"`
$ svn st wc
../svn/subversion/libsvn_wc/entries.c:671: (apr_err=130003)
svn: XML parser failed in 'wc'
../svn/subversion/libsvn_subr/xml.c:365: (apr_err=130003)
svn: Malformed XML: not well-formed (invalid token) at line 13

-- 
Philip Martin
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Nov 30 00:27:38 2004

This is an archived mail posted to the Subversion Dev mailing list.