[PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message

From: Charles Bailey <bailey.charles_at_gmail.com>
Date: 2005-06-06 17:57:26 CEST

On Wed, 23 Feb 2005, Charles Bailey wrote:> > Attached is a patchlet that, when expat fails to parse an hunk of XML,> appends at least part of the offending hunk to the error message. It
which led to an exchange regarding the need to make stringsUTF-8-safe. After too long a haitus, I posted for comment:
On 4/21/05, Charles Bailey <bailey.charles@gmail.com> wrote:> > Well, after umpteen interrupts from the rest of life,I finally got a> few hours to look at this again. In checking was was already> available, I found a handful of "string escaping" function in various> places which perform similar tasks (at least one with the comment> "this should share code with other_string_escaping_routine()"). Since> I'd have to add ya such function, I thought I'd try to abstract it a> bit, with the hope that similar routines could use a common base.> I've appended a short proposal at the bottom of this messages,> containing a common "engine" and an example implementation for> creating a UTF-8-safe version of an arbitrary string.
Julian Foad was kind enough to point out a dumb thinko, but no othercomments were forthcoming, possibly because the core developers werebusy with pre-1.2 cleanup.
So, after another too-long hiatus, here's a patch which implements a"common" string escaping function , uses it for UTF-8 escaping, and uses that to sanitize the offending XML, which is then output in theerror message that Jack built^W^Wstarted this thread.
I've interspersed my comments in the code, since there's imho zerochance that this version of the patch will besubstantially/stylistically suitable for committing. They're far fromexhaustive, but this message is long enough already.
Conceptual "Log message":[[[Add function that escapes illegal UTF-8 characters, along the wayrefactoring core ofstring-escaping routines, and insure that illegal XML error messageoutputs legal UTF-8.### Probably best applied as several patches, but collected here for review.
* subversion/libsvn_subr/escape.c: New file (svn_subr__escape_string): Final-common-path function for escaping strings.
* subversion/libsvn_subr/escape_impl.h: New file, declaring svn_subr__escape_string and convenience macros. ### Logical candidate for consolidation with utf_impl.h, perhaps assubr_impl.h
* subversion/libsvn_subr/utf.c: (fuzzy_escape): Renamed to ascii_fuzzy_escape, and rewritten to use svn_subr__escape_string. (svn_utf__stringbuf_escape_utf8_fuzzy): New function which escapes illegal UTF-8 in a string, returning the escaped string in a stringbuf. (utf8_escape_mapper): Helper function forsvn_utf__stringbuf_escape_utf8_fuzzy.
* subversion/libsvn_subr/utf_impl.h: Add prototype for svn_utf__stringbuf_escape_utf8_fuzzy. (svn_utf__cstring_escape_utf8_fuzzy): Macro implementing variantof above that returns NUL-terminated string.
* subversion/libsvn_subr/xml.c: (svn_xml_parse): If parse fails, print (sanitized) (part of) offending XML with error message.
* subversion/tests/libsvn_subr/utf-test.c: (utf_escape): New function testing UTF-8 string-escaping functions.
* subversion/po/de.po, subversion/po/es.po, subversion/po/ja.po, subversion/po/ko.po, subversion/po/nb.po, subversion/po/pl.po, subversion/po/pt_BR.po, subversion/po/sv.po, subversion/po/zh_CN.po, subversion/po/zh_TW.po: Courtesy to translators, since I've changed a localized string.
]]]

### This driver was written because there are several "escaping"functions in different### places which do similar things with slightly different criteria. It seemed best to collect### the common work into one place, if not to save space, then tominimize divergence.### The goal here is to be fast on the simple cases via the screeningarray, while allowing### flexibility for more complex substitutions via the mappingfunction. In very over-### simplified, off-the-cuff testing, eliminating the screening arraycaused a slowdiwn of### slightly less than twofold.### I've attempted to incorporate reasonable default behavior in thecase of NULL params.--- /dev/null Mon Jun 6 11:06:27 2005+++ subversion/libsvn_subr/escape.c Fri Jun 3 19:16:09 2005@@ -0,0 +1,58 @@+/*+ * escape.c: common code for cleaning up unwanted bytes in strings+ */++#include "escape_impl.h"++#define COPY_PREFIX \+ if (c > base) { \+ svn_stringbuf_appendbytes (out, base, c - base); \+ base = c; \+ }++svn_stringbuf_t *+svn_subr__escape_strin
g (svn_stringbuf_t **outsbuf,+ const unsigned char *instr,+ apr_size_t len,+ const unsigned char *isok,+ unsigned char (*mapper) (unsigned char **,+ const unsigned char *,+ apr_size_t,+ const svn_stringbuf_t *,+ void *,+ apr_pool_t *),+ void *mapper_baton,+ apr_pool_t *pool)+{+ unsigned char *base, *c;+ svn_stringbuf_t *out;++ if (outsbuf == NULL || *outsbuf == NULL) {+ out = svn_stringbuf_create ("", pool);+ if (outsbuf)+ *outsbuf = out;+ }+ else+ out = *outsbuf;++ for (c = base = (unsigned char *) instr; c < instr + len; ) {+ apr_size_t count = isok ? isok[*c] : 0;+ if (count == 0) {+ COPY_PREFIX;+ count = mapper ? mapper (&c, instr, len, out, mapper_baton, pool) : 255;+ }+ if (count == 255) {+ char esc[6];++ COPY_PREFIX;+ sprintf (esc,"?\\%03u",*c);+ svn_stringbuf_appendcstr (out, esc);+ c++;+ base = c;+ }+ else c += count;+ }+ COPY_PREFIX;+ return out;+}+

### Comments are pretty self-explanatory.### Docs are as doxygen; will need to be downgraded to plaintext since it's### an internal header.### As noted above, it makes sense to combine this with utf_impl.h.--- /dev/null Mon Jun 6 11:35:47 2005+++ subversion/libsvn_subr/escape_impl.h Thu Jun 2 18:44:05 2005@@ -0,0 +1,147 @@+/*+ * escape_impl.h : private header for string escaping function.+ */++++#ifndef SVN_LIBSVN_SUBR_ESCAPE_IMPL_H+#define SVN_LIBSVN_SUBR_ESCAPE_IMPL_H+++#include "svn_pools.h"+#include "svn_string.h"++#ifdef __cplusplus+extern "C" {+#endif /* __cplusplus */+++/** Scan @a instr of length @a len bytes, copying to stringbuf @a *outsbuf,+ * escaping bytes as indicated by the lookup array @a isok and the mapping+ * function @a mapper. Memory is allocated from @a pool. You may provide+ * any extra information needed by @a mapper in @a mapper_baton.+ * Returns a pointer to the stringbuf containing the escaped string.+ *+ * If @a outsbuf or *outsbuf is NULL, a new stringbuf is created; itsadd
ress is+ * placed in @a outsbuf unless that argument is NULL.+ * If @a isok is NULL, then @a mapper is used exclusively.+ * If @ mapper is NULL, then a single character is escaped every time @a mapper+ * would have been called.+ *+ * This is designed to be the common pathway for various string "escaping"+ * functions across subversion. The basic approach is to scan+ * the input and decide whether each byte is OK as it stands, needs to be+ * "escaped" using subversion's "?\uuu" default format, or needs to be+ * transformed in some other way. The decision is made using a two step+ * process, which is designed to handle the simple cases quickly but allow+ * for more complex mappings. Since the typical string will (we hope)+ * comprise mostly simple cases, this shouldn't require much code+ * complexity or loss of efficiency. The two steps used are:+ *+ * 1. The value of a byte from the input string ("test byte") is used as an+ * index into a (usually 256 byte) array passed in by the caller.+ * - If t
he value of the appropriate array element is 0xff,+ * then the test byte is escaped as a "?\uuu" string in the output.+ * - If the value of the appropriate element is otherwise non-zero,+ * that many bytes are copied verbatim from the input to the output.+ * 2. If the array yields a 0 value, then a mapping function provided by+ * the caller is used to allow for more complex evaluation. This function+ * receives five arguments:+ * - a pointer to the pointer used by svn__do_char_escape() to+ * mark the test byte in the input string+ * - a pointer to the start of the input string+ * - the length of the input string+ * - a pointer to the output stringbuf+ * - the ever-helpful pool.+ * The mapping function may return a (positive) nonzero value,+ * which is interpreted * as described in step 1 above, or zero,+ * indicating that the test byte * should be ignored. In the latter+ * case, this is generally because the * mapping function has done the+ * necessa
ry work itself; it's free to * modify the output stringbuf and+ * adjust the pointer to the test byte * as it sees fit (within the+ * bounds of the input string). At a minimum, * it should at least+ * increment the pointer to the test byte before * returning 0, in order+ * to avoid an infinite loop.+ */++svn_stringbuf_t *+svn_subr__escape_string (svn_stringbuf_t **outsbuf,+ const unsigned char *instr,+ apr_size_t len,+ const unsigned char *isok,+ unsigned char (*mapper) (unsigned char **,+ const unsigned char *,+ apr_size_t,+ const svn_stringbuf_t *,+ void *,+ apr_pool_t *),+ void *mapper_baton,+ apr_pool_t *pool);++++/** Initializer for a basic screening matrix suitable for use with+ * #svn_subr__escape_string to escape non-UTF-8 bytes.+ * We provide this since "UTF-8-safety" is a common denominator for+ * most string escaping in Subversion, so this matrix makes a good+ * starting point for more involved schemes.+ */ +#define SVN_ESCAPE_UTF8_LEGAL_A
RRAY { \+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\+ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\+255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0,
   0, 0, 0, 0, 0,\+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\+ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\+ 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}++/** Given pointer @a c into a string which ends at @a e, figure out+ * whether (*c) starts a valid UTF-8 sequence, and if so, how many bytes+ * it includes. Return 255 if it's not valid UTF-8.+ * For a more detailed description of the encoding rules, see the UTF-8+ * specification in section 3-9 of the Unicode standard 4.0 (e.g. at+ * http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf),+ * with special attention to Table 3-6.+ * This macro is also provided as a building block for mappers used by+ * #svn_subr__escape_string that want to check for UTF-8-safety in+ * addition to other tasks.+ */+#define SVN_ESCAPE_UTF8_MAPPING(c,e) \+ ( (c)[0] < 0x80 ? /* ASCII */ \+ 1 :
             /* OK, 1 byte */ \+ ( ( ((c)[0] > 0xc2 && (c)[0] < 0xdf) && /* 2-byte char */ \+ ((c) + 1 <= (e)) && /* Got 2 bytes */ \+ ((c)[1] >= 0x80 && (c)[1] <= 0xbf)) ? /* Byte 2 legal */ \+ 2 : /* OK, 2 bytes */ \+ ( ( ((c)[0] >= 0xe0 && (c)[0] <= 0xef) && /* 3 byte char */ \+ ((c) + 2 <= (e)) && /* Got 3 bytes */ \+ ((c)[1] >= 0x80 && (c)[1] <= 0xbf) && /* Basic byte 2 legal */ \+ ((c)[2] >= 0x80 && (c)[2] <= 0xbf) && /* Basic byte 3 legal */ \+ (!((c)[0] == 0xe0 && (c)[1] < 0xa0)) && /* 0xe0-0x[89]? illegal */\+ (!((c)[0] == 0xed && (c)[1] > 0x9f)) ) ? /* 0xed-0x[ab]? illegal */\+ 3 : /* OK, 3 bytes */ \+ ( ( ((c)[0] >= 0xf0 && (c)[0] <= 0xf4) && /* 4 byte char */ \+ ((c) + 3 <= (e)) && /* Got 4 bytes */ \+ ((c)[1] >= 0x80 && (c)[1] <= 0xbf)
   && /* Basic byte 2 legal */ \+ ((c)[2] >= 0x80 && (c)[2] <= 0xbf) && /* Basic byte 3 legal */ \+ ((c)[3] >= 0x80 && (c)[3] <= 0xbf) && /* Basic byte 4 legal */ \+ (!((c)[0] == 0xf0 && (c)[1] < 0x90)) && /* 0xf0-0x8? illegal */ \+ (!((c)[0] == 0xf4 && (c)[1] > 0x8f)) ) ? /* 0xf4-0x[9ab]? illegal*/\+ 4 : /* OK, 4 bytes */ \+ 255)))) /* Illegal; escape it */+++#ifdef __cplusplus+}+#endif /* __cplusplus */++#endif /* SVN_LIBSVN_SUBR_ESCAPE_IMPL_H */

### Function names can be revised to fit convention, of course. ### svn_utf__cstring_escape_utf8_fuzzy serves as an example of a benefit of### returning the resultant stringbuf from svn_subr__escape_string both in a### parameter and as the function's return value. If the sense is thatit'll be a cause### of debugging headaches, or that it's cortrary to subversionculture to code public### functions as macros, it's easy enough to code this as a function,and to make### svn_subr__escape_string return void (or less likely svn_error_t,if it got pickier### about params.)--- subversion/libsvn_subr/utf_impl.h (revision 14986)+++ subversion/libsvn_subr/utf_impl.h (working copy)@@ -24,12 +24,33 @@ #include <apr_pools.h> #include "svn_types.h"+#include "svn_string.h" #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ +/** Replace any non-UTF-8 characters in @a len byte long string @a src with+ * escaped representations, placing the result in a stringbuf pointed to by+ * @a *dest, which will be created if nece
ssary. Memory is allocated from+ * @a pool as needed. Returns a pointer to the stringbuf containing the result+ * (identical to @a *dest, but facilitates chaining calls).+ */+svn_stringbuf_t *+svn_utf__stringbuf_escape_utf8_fuzzy (svn_stringbuf_t **dest,+ const unsigned char *src,+ apr_size_t len,+ apr_pool_t *pool);++/** Replace any non-UTF-8 characters in @a len byte long string @a src with+ * escaped representations. Memory is allocated from @a pool as needed.+ * Returns a pointer to the resulting string.+ */+#define svn_utf__cstring_escape_utf8_fuzzy(src,len,pool) \+ (svn_utf__stringbuf_escape_utf8_fuzzy(NULL,(src),(len),(pool)))->data++const char *svn_utf__cstring_from_utf8_fuzzy (const char *src, apr_pool_t *pool, svn_error_t *(*convert_from_utf8)

### There're other places that could be rewritten in terms of the new escaping### functions, but I hope the two given here serve as an example of how it might### be done.### The rename to ascii_fuzzy_escape is to distinguish it from the new functions### that escape only illegal UTF-8 sequences.--- subversion/libsvn_subr/utf.c (revision 14986)+++ subversion/libsvn_subr/utf.c (working copy)@@ -30,6 +30,7 @@ #include "svn_pools.h" #include "svn_ctype.h" #include "svn_utf.h"+#include "escape_impl.h" #include "utf_impl.h" #include "svn_private_config.h" @@ -323,53 +324,19 @@ /* Copy LEN bytes of SRC, converting non-ASCII and zero bytes to ?\nnn sequences, allocating the result in POOL. */ static const char *-fuzzy_escape (const char *src, apr_size_t len, apr_pool_t *pool)+ascii_fuzzy_escape (const char *src, apr_size_t len, apr_pool_t *pool) {- const char *src_orig = src, *src_end = src + len;- apr_size_t new_len = 0;- char *new;- const char *new_orig;+ static unsigned char asciinonul[256];+ svn_stringbu
f_t *result = NULL; - /* First count how big a dest string we'll need. */- while (src < src_end)- {- if (! svn_ctype_isascii (*src) || *src == '\0')- new_len += 5; /* 5 slots, for "?\XXX" */- else- new_len += 1; /* one slot for the 7-bit char */+ if (!asciinonul[0]) {+ asciinonul[0] = 255; /* NUL's not allowed */+ memset(asciinonul + 1, 1, 127); /* Other regular ASCII OK */+ memset(asciinonul + 128, 255, 128); /* High half not allowed */+ } - src++;- }-- /* Allocate that amount. */- new = apr_palloc (pool, new_len + 1);-- new_orig = new;-- /* And fill it up. */- while (src_orig < src_end)- {- if (! svn_ctype_isascii (*src_orig) || src_orig == '\0')- {- /* This is the same format as svn_xml_fuzzy_escape uses, but that- function escapes different characters. Please keep in sync!- ### If we add another fuzzy escape somewhere, we should abstract- ### this out to a commo
n function. */- sprintf (new, "?\\%03u", (unsigned char) *src_orig);- new += 5;- }- else- {- *new = *src_orig;- new += 1;- }-- src_orig++;- }-- *new = '\0';-- return new_orig;+ svn_subr__escape_string(&result, src, len, asciinonul, NULL, NULL, pool);+ return result->data; } /* Convert SRC_LENGTH bytes of SRC_DATA in NODE->handle, store the result@@ -448,7 +415,7 @@ errstr = apr_psprintf (pool, _("Can't convert string from '%s' to '%s':"), node->frompage, node->topage);- err = svn_error_create (apr_err, NULL, fuzzy_escape (src_data,+ err = svn_error_create (apr_err, NULL, ascii_fuzzy_escape (src_data, src_length, pool)); return svn_error_create (apr_err, err, errstr); }@@ -564,7 +531,28 @@ return SVN_NO_ERROR; } +static unsigned char+utf8_escape_mapper (unsigned char **targ, const unsigned char *start,+ apr_size_t len, con
st svn_stringbuf_t *dest,+ void *baton, apr_pool_t *pool)+{+ const unsigned char *end = start + len;+ return SVN_ESCAPE_UTF8_MAPPING(*targ, end);+} +svn_stringbuf_t *+svn_utf__stringbuf_escape_utf8_fuzzy (svn_stringbuf_t **dest,+ const unsigned char *src,+ apr_size_t len,+ apr_pool_t *pool)+{+ static unsigned char utf8screen[256] = SVN_ESCAPE_UTF8_LEGAL_ARRAY;++ return svn_subr__escape_string(dest, src, len,+ utf8screen, utf8_escape_mapper, NULL,+ pool);+}+ svn_error_t * svn_utf_stringbuf_to_utf8 (svn_stringbuf_t **dest, const svn_stringbuf_t *src,@@ -787,7 +775,7 @@ const char *escaped, *converted; svn_error_t *err; - escaped = fuzzy_escape (src, strlen (src), pool);+ escaped = ascii_fuzzy_escape (src, strlen (src), pool); /* Okay, now we have a *new* UTF-8 string, one that's guaranteed to contain only 7-bit bytes :-). Recode to native... */
### With code comes testing.### Note: Contains 8-bit chars, and also uses convention that cc will treat### "foo" "bar" as "foobar". Both can be avoided if useful forfinicky compilers.
--- subversion/tests/libsvn_subr/utf-test.c (revision 14986)+++ subversion/tests/libsvn_subr/utf-test.c (working copy)@@ -17,6 +17,7 @@ */ #include "../svn_test.h"+#include "../../include/svn_utf.h" #include "../../libsvn_subr/utf_impl.h" /* Random number seed. Yes, it's global, just pretend you can't see it. */@@ -222,6 +223,84 @@ return SVN_NO_ERROR; } +static svn_error_t *+utf_escape (const char **msg,+ svn_boolean_t msg_only,+ svn_test_opts_t *opts,+ apr_pool_t *pool)+{+ char in[] = { 'A', 'S', 'C', 'I', 'I', /* All printable */+ 'R', 'E', 'T', '\n', 'N', /* Newline */+ 'B', 'E', 'L', 0x07, '!', /* Control char */+ 0xd2, 0xa6, 'O', 'K', '2', /* 2-byte char, valid */+ 0xc0, 0xc3, 'N', 'O', '2', /* 2-byte char, invalid 1st */+ 0x82, 0xc3, 'N', 'O', '2', /* 2-byte char, invalid 2nd */+ 0xe4, 0x87, 0xa0, 'O', 'K', /* 3-byte char, valid */+ 0xe2, 0xff, 0xba, 'N', 'O', /*3-byte char, invalid 2nd */+ 0xe0, 0x87, 0xa0, 'N', 'O', /*3-byte char, invalid 2nd */+ 0xed,
0xa5, 0xa0, 'N', 'O', /*3-byte char, invalid 2nd */+ 0xe4, 0x87, 0xc0, 'N', 'O', /* 3-byte char, invalid 3rd */+ 0xf2, 0x87, 0xa0, 0xb5, 'Y', /* 4-byte char, valid */+ 0xf2, 0xd2, 0xa0, 0xb5, 'Y', /* 4-byte char, invalid 2nd */+ 0xf0, 0x87, 0xa0, 0xb5, 'N', /* 4-byte char, invalid 2nd */+ 0xf4, 0x97, 0xa0, 0xb5, 'N', /* 4-byte char, invalid 2nd */+ 0xf2, 0x87, 0xc3, 0xb5, 'N', /* 4-byte char, invalid 3rd */+ 0xf2, 0x87, 0xa0, 0xd5, 'N', /* 4-byte char, invalid 4th */+ 0x00 };+ const unsigned char *legalresult =+ "ASCIIRET\nNBEL!$-1(c)ÃŠOK2?\\192?\\195NO2?\\130?\\195NO2"-A+ "3$-3Ä±Î©0â€°Ã¡â€ 1OK?\\226?\\255?\\186NO?\\224?\\135?\\160NO?\\237?\\165?\\160NO"-A+ "?\\228?\\135?\\192NO3$-3Ä±Î©0ÃšÃ¡â€ Âµ1Y?\\242$-1(c)â€¡?\\181Y?\\240?\\135?\\160"-A+ "?\\181N?\\244?\\151?\\160?\\181N?\\242?\\135Ä±N?\\242?\\135?\\160"+ "?\\213N";+ const unsigned char *asciiresult =+ "ASCIIRET\nNBEL\x07!?\\210?\\166OK2?\\192?\\195NO2?\\130?\\195NO2"+ "?\\228?\\135?\\160OK?\\2
26?\\255?\\186NO?\\224?\\135?\\160NO"+ "?\\237?\\165?\\160NO?\\228?\\135?\\192NO?\\242?\\135?\\160?\\181Y"+ "?\\242?\\210?\\160?\\181Y?\\240?\\135?\\160?\\181N"+ "?\\244?\\151?\\160?\\181N?\\242?\\135?\\195?\\181N"+ "?\\242?\\135?\\160?\\213N";+ const unsigned char *asciified;+ apr_size_t legalresult_len = 213; /* == strlen(legalresult) iff no NULs */+ int i = 0;+ svn_stringbuf_t *escaped = NULL;++ *msg = "test utf string escaping";++ if (msg_only)+ return SVN_NO_ERROR;++ if (svn_utf__stringbuf_escape_utf8_fuzzy+ (&escaped, in, sizeof in - 1, pool) != escaped)+ return svn_error_createf+ (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+ i++;+ if (escaped->len != legalresult_len)+ return svn_error_createf+ (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+ i++;+ if (memcmp(escaped->data, legalresult, legalresult_len))+ return svn_error_createf+ (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+ i++;+ if (memcmp(es
caped->data, legalresult, legalresult_len))+ return svn_error_createf+ (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+ i++;++ asciified = svn_utf_cstring_from_utf8_fuzzy(in, pool);+ if (strlen(asciified) != strlen(asciiresult))+ return svn_error_createf+ (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+ i++;+ if (strcmp(asciified, asciiresult))+ return svn_error_createf+ (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+ i++;++ return SVN_NO_ERROR;+}+ /* The test table. */ @@ -230,5 +309,6 @@ SVN_TEST_NULL, SVN_TEST_PASS (utf_validate), SVN_TEST_PASS (utf_validate2),+ SVN_TEST_PASS (utf_escape), SVN_TEST_NULL };

### The original point of this thread.### This patch will apply with an offset, since I've cut out sections which### reimplement XML escaping in terms of the svn_subr__escape_string.--- subversion/libsvn_subr/xml.c (revision 14986)+++ subversion/libsvn_subr/xml.c (working copy)@@ -395,11 +413,22 @@ /* If expat choked internally, return its error. */ if (! success) {+ svn_stringbuf_t *sanitized;+ unsigned char *end;+ + svn_utf__stringbuf_escape_utf8_fuzzy(&sanitized, buf,+ (len > 240 ? 240 : len),+ svn_parser->pool);+ end = sanitized->data ++ (sanitized->len > 240 ? 240 : sanitized->len);+ while (*end > 0x80 && *end < 0xc0 &&+ (char *) end > sanitized->data) end--; err = svn_error_createf (SVN_ERR_XML_MALFORMED, NULL, - _("Malformed XML: %s at line %d"),+ _("Malformed XML: %s at line %d; XML starts:\n%.*s"), XML_ErrorString (XML_GetErrorCode (svn_parser->parser)),- XML_GetCurrentLineNumber (svn_parser->
parser));+ XML_GetCurrentLineNumber (svn_parser->parser),+ (char *) end - sanitized->data + 1, sanitized->data); /* Kill all parsers and return the expat error */ svn_xml_free_parser (svn_parser);

### Finally, be kind to the translators.--- subversion/po/pt_BR.po (revision 14986)+++ subversion/po/pt_BR.po (working copy)@@ -6006,8 +6006,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "XML mal formado: %s na linha %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "XML mal formado: %s na linha %d; XML comeÃa:\n%.240s" #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/es.po (revision 14986)+++ subversion/po/es.po (working copy)@@ -6102,8 +6102,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "XML malformado: %s en la lÃŒnea %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "XML malformado: %s en la lÃŒnea %d; XML comienza:\n%.240s" #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/de.po (revision 14986)+++ subversion/po/de.po (working copy)@@ -6090,8 +6090,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "Fehlerhaftes XML: %s in Zei
le %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "Fehlerhaftes XML: %s in Zeile %d; XML beginnt:\n%.240s" #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/sv.po (revision 14986)+++ subversion/po/sv.po (working copy)@@ -6005,8 +6005,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "Felaktig XML: %s pÃ‚ rad %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "Felaktig XML: %s pÃ‚ rad %d; XML starta:\n%.240s" #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/ko.po (revision 14986)+++ subversion/po/ko.po (working copy)@@ -5906,8 +5906,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "3$-3Ä±Î©0ÃÃ»Ã²13Ä±Î©0ÃŽâ„¢Âª13Ä±Î©0ÃŽÃªÃº1 XML: %s (3Ä±Î©0ÃÂ§Ã‘13Ä±Î©0ÃŽâ‰¤Ã 13Ä±Î©0ÃŒÃ²âˆ1 %d)"-A+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "3$-3Ä±Î©0ÃÃ»Ã²13Ä±Î©0ÃŽâ„¢Âª13Ä±Î©0ÃŽÃªÃº1 XML: %s (3Ä±Î©0ÃÂ§Ã‘13Ä±Î©0ÃŽâ‰¤Ã 13Ä±Î©0ÃŒÃ²â
ˆ1 %d); XML:\n%.240s"-A #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/ja.po (revision 14986)+++ subversion/po/ja.po (working copy)@@ -6463,8 +6463,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "3$-3Ä±Î©0ÃÃ¯âˆž13Ä±Î©0Ã‚âˆâˆ1$-2Ã¦ XML Ã¦Â«Ã¦Ï€: %s (3$-3Ä±Î©0Ã‹Â°Ã¥1 %d)"-A+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "3$-3Ä±Î©0ÃÃ¯âˆž13Ä±Î©0Ã‚âˆâˆ1$-2Ã¦ XML Ã¦Â«Ã¦Ï€: %s(3$-3Ä±Î©0Ã‹Â°Ã¥1 %d); XML:\n%.240s"-A #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/pl.po (revision 14986)+++ subversion/po/pl.po (working copy)@@ -6103,8 +6103,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "Uszkodzony XML: %s w linii %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "Uszkodzony XML: %s w linii %d; XML wersja:\n%.240s" #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/zh_TW.po (revision 14986)+++ subversion/po/zh_TW.po (working copy)@
@ -5896,8 +5896,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "3$-3Ä±Î©0ÃŠÃºÃ¢13Ä±Î©0ÃÂºâˆ«13Ä±Î©0ÃˆÃ´âˆ‘1 XML: %s3Ä±Î©0ÃŠÃ±Âº13Ä±Î©0ÃÂ¨Â¨1 %d 3Ä±Î©0Ã‚Ã Ã³1"-A+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "3$-3Ä±Î©0ÃŠÃºÃ¢13Ä±Î©0ÃÂºâˆ«13Ä±Î©0ÃˆÃ´âˆ‘1 XML: %s3Ä±Î©0ÃŠÃ±Âº13Ä±Î©0ÃÂ¨Â¨1 %d 3Ä±Î©0Ã‚Ã Ã³1; XML:\n%.240s"-A #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/nb.po (revision 14986)+++ subversion/po/nb.po (working copy)@@ -5995,8 +5995,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "Misdannet XML: %s i linje %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "Misdannet XML: %s i linje %d; XML starter:\n%.240s" #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/zh_CN.po (revision 14986)+++ subversion/po/zh_CN.po (working copy)@@ -5955,8 +5955,8 @@ #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msg
str "3$-3Ä±Î©0ÃÃ¯âˆ13Ä±Î©0Ã‚Î©Â¢13Ä±Î©0ÃÃ¶Ã‘1XMLÃšË™%s3Ä±Î©0Ã‚Ãº(r)13Ä±Î©0ÃÂ¨Â¨1 %d 3Ä±Î©0Ã‹Â°Ã¥1"-A+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "3$-3Ä±Î©0ÃÃ¯âˆ13Ä±Î©0Ã‚Î©Â¢13Ä±Î©0ÃÃ¶Ã‘1XMLÃšË™%s3Ä±Î©0Ã‚Ãº(r)13Ä±Î©0ÃÂ¨Â¨1 %d 3Ä±Î©0Ã‹Â°Ã¥1; XML:\n%.240s"-A #: libsvn_wc/adm_crawler.c:380 #, c-format
### End of patch ###
-- Regards,Charles BaileyLists: bailey _dot_ charles _at_ gmail _dot_ comOther: bailey _at_ newman _dot_ upenn _dot_ edu
Received on Mon Jun 6 18:23:43 2005

This message: [ Message body ]
Next message: Greg Thomas: "Re: Text mime types"
Previous message: Greg Hudson: "Re: Translation: request for clarification for two messages"
Next in thread: Michael W Thelen: "Re: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message"
Reply: Michael W Thelen: "Re: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message"
Reply: Michael W Thelen: "Re: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message"
Reply: Charles Bailey: "Re: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message"
Reply: Philip Martin: "Re: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]