[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] issue #2147 - v1

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: 2005-01-26 17:37:11 CET

VK Sameer wrote:
> Attached is a fix for issue #2147, with much help from Karl Fogel.

And here's my rather picky review. :-)

- Julian

> ------------------------------------------------------------------------
>
> Fixed issue #2147 - control characters in log message cause failure (over DAV)
>
> * subversion/mod_dav_svn/log.c: modified log_receiver to call
> svn_xml_fuzzy_escape()

But what is the intention or result of this change? In other words, in what
way does calling that new function affect what this function does? For example:

* subversion/mod_dav_svn/log.c (log_receiver): call svn_xml_fuzzy_escape
     to ensure no control characters remain in the log message.

> * subversion/include/svn_xml.h,
> subversion/libsvn_subr/xml.c: added svn_xml_fuzzy_escape()

You should name in parentheses each top-level program object (function,
variable, etc.) that is modified/added/removed:

   (svn_xml_fuzzy_escape): New function.

> * subversion/tests/clients/cmdline/log_tests.py:
> added escape_control_chars test

   (escape_control_chars): New test.
   (test_list): Added the new test.

> ------------------------------------------------------------------------

> Index: subversion/include/svn_xml.h
> ===================================================================
> --- subversion/include/svn_xml.h (revision 12853)
> +++ subversion/include/svn_xml.h (working copy)
> @@ -117,7 +117,14 @@
> const char *string,
> apr_pool_t *pool);
>
> +/** Return @c string on finding no ASCII control characters. Otherwise
> + * return a copy created with @c pool, with control characters

"@a string" and "@a pool" - they are Arguments rather than Cross-references
(that's how I think of the "@a" and "@c" mnemonics, even if that's not the
formal definition).

> + * replaced by "0x\d+" sequences.

Presumably hex digits, not "\d".

Marking the sequence with ordinary characters '0' and 'x' doesn't seem like the
most friendly quoting method for human-readable text (except obviously it's
friendly to C programmers). I would have thought a notation starting with an
unusual character like "\" or "&" and having a fixed number of digits (two,
since it's always escaping a single byte) would be better. But hey, I suppose
the exact method used is a minor detail in the context of this whole patch -
it's evidently not intended to be reversible or parseable, and is just a
work-around to avoid failure.

> + */
> +const char* svn_xml_fuzzy_escape (const char *string,

Put space before "*", not after it.

Why does this return "const char *" when all the other svn_xml_escape_...
functions return void and append their output to a stringbuf? (Not saying you
necessarily should change it.)

> + apr_pool_t *pool);
>
> +
> /*---------------------------------------------------------------*/
>
> /* Generalized Subversion XML Parsing */
> Index: subversion/libsvn_subr/xml.c
> ===================================================================
> --- subversion/libsvn_subr/xml.c (revision 12853)
> +++ subversion/libsvn_subr/xml.c (working copy)
> @@ -25,6 +25,7 @@
> #include "svn_pools.h"
> #include "svn_xml.h"
> #include "svn_error.h"
> +#include "svn_ctype.h"
>
> #ifdef SVN_HAVE_OLD_EXPAT
> #include "xmlparse.h"
> @@ -265,7 +266,47 @@
> xml_escape_attr (outstr, string, (apr_size_t) strlen (string), pool);
> }
>
> +/* Escape any ASCII control chars. */

Comment not needed at implementation, because the description is given at the
prototype.

> +const char*

Space before "*".

> +svn_xml_fuzzy_escape (const char *string, apr_pool_t *pool)
> +{
> + const char *end = string + strlen (string);
> + const char *p = string, *q;
> + svn_stringbuf_t *outstr;
>
> + for (q = p; q < end; q++)

"for (q = string; ...)" would be clearer.

> + if (svn_ctype_iscntrl(*q))

Space after function name.

> + break;
> +
> + /* return original string if no control characters found*/
> + if (q == end)
> + return string;
> +
> + outstr = svn_stringbuf_create ("", pool);
> + while (1)
> + {
> + q = p;
> + /* Traverse till either control character or eos */
> + while (q < end && !svn_ctype_iscntrl(*q))

Space.

> + q++;
> +
> + /* copy chunk before marker */
> + svn_stringbuf_appendbytes (outstr, p, q - p);
> +
> + if (q == end)
> + break;
> +
> + /* Append the control character in 0x%x format. */

(Format is described differently in the doc string.)

> + char escaped_char[7];

You can't put declarations after code in C. (Obviously your compiler let you;
perhaps you can increase its warnings level.) Move it to the top of the block.

Seven characters: '0', 'x', xdigit, [xdigit,] '\0', and what? (It's not that
having a couple of extra bytes is a waste of memory, it's that it confuses the
reader: I'm now looking around trying to find what might use the extra bytes.)

> + sprintf (escaped_char, "0x%x", *q);
> + svn_stringbuf_appendcstr (outstr, escaped_char);
> +
> + p = q + 1;
> + }
> +
> + return outstr->data;
> +}
> +
>
> /*** Map from the Expat callback types to the SVN XML types. ***/
>
> Index: subversion/mod_dav_svn/log.c
> ===================================================================
> --- subversion/mod_dav_svn/log.c (revision 12853)
> +++ subversion/mod_dav_svn/log.c (working copy)
> @@ -98,9 +98,10 @@
> if (msg)
> SVN_ERR( dav_svn__send_xml(lrb->bb, lrb->output,
> "<D:comment>%s</D:comment>" DEBUG_CR,
> - apr_xml_quote_string(pool, msg, 0)) );
> + svn_xml_fuzzy_escape (
> + apr_xml_quote_string (pool, msg, 0),
> + pool)) );

No. You should strip control characters before you XML-quote it, otherwise you
are relying on (1) your function producing validly quoted XML (which it may do
at present but is not documented to do), and (2) the XML-quoting function
accepting and passing through control characters (which such a function need
not be expected to do).

>
> -
> if (changed_paths)
> {
> apr_hash_index_t *hi;
> Index: subversion/tests/clients/cmdline/log_tests.py
> ===================================================================
> --- subversion/tests/clients/cmdline/log_tests.py (revision 12853)
> +++ subversion/tests/clients/cmdline/log_tests.py (working copy)
> @@ -516,6 +516,70 @@
> svntest.actions.run_and_verify_svn (None, [], SVNAnyOutput,
> 'log', '-r', '2', mu2_URL)
>
> +#----------------------------------------------------------------------
> +def escape_control_chars(sbox):
> + "make sure mod_dav_svn escapes control chars"
> +
> + dump_str = """SVN-fs-dump-format-version: 2
> +
> +UUID: ffcae364-69ee-0310-a980-ca5f10462af2
> +
> +Revision-number: 0
> +Prop-content-length: 56
> +Content-length: 56
> +
> +K 8
> +svn:date
> +V 27
> +2005-01-24T10:09:21.759592Z
> +PROPS-END
> +
> +Revision-number: 1
> +Prop-content-length: 128
> +Content-length: 128
> +
> +K 7
> +svn:log
> +V 37
> +this msg contains a Ctrl-T:  <- here

Eww. Is this likely to be preserved in people's editors? Perhaps you could
find a programmatic way of inserting it into the string without it having to be
a literal control character in the source file.

> +K 10
> +svn:author
> +V 7
> +jrandom
> +K 8
> +svn:date
> +V 27
> +2005-01-24T10:09:22.012524Z
> +PROPS-END
> +"""
> +
> + # Create virgin repos and working copy
> + svntest.main.safe_rmtree(sbox.repo_dir, 1)
> + svntest.main.create_repos(sbox.repo_dir)
> + svntest.main.set_repos_paths(sbox.repo_dir)
> +
> + URL = svntest.main.current_repo_url
> +
> + # load dumpfile with control character into repos to get
> + # a log with control char content
> + output, errput = \
> + svntest.main.run_command_stdin(
> + "%s load --quiet %s" % (svntest.main.svnadmin_binary, sbox.repo_dir),
> + None, 1, dump_str)
> +
> + # run log
> + output, errput = svntest.actions.run_and_verify_svn ("", None, [], 'log', URL)
> +
> + # verify output contains expected fuzzy escape sequence
> + match_re = "this msg contains a Ctrl-T.*"
> + for line in output:
> + if re.match (match_re, line):
> + break
> + else:
> + raise svntest.Failure ("Failed to find " + str(match_re) + " in " + str(output),
> + "error: " + str(errput))
> +
> +
> ########################################################################
> # Run the tests
>
> @@ -530,6 +594,7 @@
> log_with_path_args,
> url_missing_in_head,
> log_through_copyfrom_history,
> + escape_control_chars,
> ]
>
> if __name__ == '__main__':

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jan 26 17:38:28 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.