On Wed, 23 Feb 2005, Charles Bailey wrote:> > Attached is a patchlet that, when expat fails to parse an hunk of XML,> appends at least part of the offending hunk to the error message.  It
which led to an exchange regarding the need to make stringsUTF-8-safe.  After too long a haitus, I posted for comment:
On 4/21/05, Charles Bailey <bailey.charles@gmail.com> wrote:> > Well, after umpteen interrupts from the rest of life,I finally got a> few hours to look at this again.    In checking was was already> available, I found a handful of "string escaping" function in various> places which perform similar tasks (at least one with the comment> "this should share code with other_string_escaping_routine()").  Since> I'd have to add ya such function, I thought I'd try to abstract it a> bit, with the hope that similar routines could use a common base.> I've appended a short proposal at the bottom of this messages,> containing a common "engine" and an example implementation for> creating a UTF-8-safe version of an arbitrary string.
Julian Foad was kind enough to point out a dumb thinko, but no othercomments were forthcoming, possibly because the core developers werebusy with pre-1.2 cleanup.
So, after another too-long hiatus, here's a patch which implements a"common" string escaping function , uses it for UTF-8 escaping, and uses that to sanitize the offending XML, which is then output in theerror message that Jack built^W^Wstarted this thread.
I've interspersed my comments in the code, since there's imho zerochance that this version of the patch will besubstantially/stylistically suitable for committing.  They're far fromexhaustive, but this message is long enough already.
Conceptual "Log message":[[[Add function that escapes illegal UTF-8 characters, along the wayrefactoring core ofstring-escaping routines, and insure that illegal XML error messageoutputs legal UTF-8.### Probably best applied as several patches, but collected here for review.
* subversion/libsvn_subr/escape.c:   New file   (svn_subr__escape_string): Final-common-path function for escaping strings.
* subversion/libsvn_subr/escape_impl.h:   New file, declaring svn_subr__escape_string and convenience macros.   ### Logical candidate for consolidation with utf_impl.h, perhaps assubr_impl.h
* subversion/libsvn_subr/utf.c:   (fuzzy_escape): Renamed to ascii_fuzzy_escape, and rewritten to use    svn_subr__escape_string.   (svn_utf__stringbuf_escape_utf8_fuzzy): New function which escapes illegal    UTF-8 in a string, returning the escaped string in a stringbuf.   (utf8_escape_mapper): Helper function forsvn_utf__stringbuf_escape_utf8_fuzzy.
* subversion/libsvn_subr/utf_impl.h:   Add prototype for svn_utf__stringbuf_escape_utf8_fuzzy.   (svn_utf__cstring_escape_utf8_fuzzy):  Macro implementing variantof above that    returns NUL-terminated string.
* subversion/libsvn_subr/xml.c:   (svn_xml_parse): If parse fails, print (sanitized) (part of) offending XML    with error message.
* subversion/tests/libsvn_subr/utf-test.c:   (utf_escape): New function testing UTF-8 string-escaping functions.
* subversion/po/de.po, subversion/po/es.po, subversion/po/ja.po,  subversion/po/ko.po, subversion/po/nb.po, subversion/po/pl.po,  subversion/po/pt_BR.po, subversion/po/sv.po,  subversion/po/zh_CN.po, subversion/po/zh_TW.po:  Courtesy to translators, since I've changed a localized string.
]]]
### This driver was written because there are several "escaping"functions in different### places which do similar things with slightly different criteria. It seemed best to collect### the common work into one place, if not to save space, then tominimize divergence.### The goal here is to be fast on the simple cases via the screeningarray, while allowing### flexibility for more complex substitutions via the mappingfunction.  In very over-### simplified, off-the-cuff testing, eliminating the screening arraycaused a slowdiwn of### slightly less than twofold.### I've attempted to incorporate reasonable default behavior in thecase of NULL params.--- /dev/null	Mon Jun  6 11:06:27 2005+++ subversion/libsvn_subr/escape.c	Fri Jun  3 19:16:09 2005@@ -0,0 +1,58 @@+/*+ * escape.c:	common code for cleaning up unwanted bytes in strings+ */++#include "escape_impl.h"++#define COPY_PREFIX \+  if (c > base) { \+    svn_stringbuf_appendbytes (out, base, c - base); \+    base = c; \+  }++svn_stringbuf_t *+svn_subr__escape_strin
g (svn_stringbuf_t **outsbuf,+			 const unsigned char *instr,+			 apr_size_t len,+			 const unsigned char *isok,+			 unsigned char (*mapper) (unsigned char **,+						  const unsigned char *,+						  apr_size_t,+						  const svn_stringbuf_t *,+						  void *,+						  apr_pool_t *),+			 void *mapper_baton,+			 apr_pool_t *pool)+{+  unsigned char *base, *c;+  svn_stringbuf_t *out;++  if (outsbuf == NULL || *outsbuf == NULL) {+    out = svn_stringbuf_create ("", pool);+    if (outsbuf)+      *outsbuf = out;+  }+  else+    out = *outsbuf;++  for (c = base = (unsigned char *) instr; c < instr + len; ) {+    apr_size_t count = isok ? isok[*c] : 0;+    if (count == 0) {+      COPY_PREFIX;+      count = mapper ? mapper (&c, instr, len, out, mapper_baton, pool) : 255;+    }+    if (count == 255) {+      char esc[6];++      COPY_PREFIX;+      sprintf (esc,"?\\%03u",*c);+      svn_stringbuf_appendcstr (out, esc);+      c++;+      base = c;+    }+    else c += count;+  }+  COPY_PREFIX;+  return out;+}+
### Comments are pretty self-explanatory.### Docs are as doxygen; will need to be downgraded to plaintext since it's### an internal header.### As noted above, it makes sense to combine this with utf_impl.h.--- /dev/null	Mon Jun  6 11:35:47 2005+++ subversion/libsvn_subr/escape_impl.h	Thu Jun  2 18:44:05 2005@@ -0,0 +1,147 @@+/*+ * escape_impl.h :  private header for string escaping function.+ */++++#ifndef SVN_LIBSVN_SUBR_ESCAPE_IMPL_H+#define SVN_LIBSVN_SUBR_ESCAPE_IMPL_H+++#include "svn_pools.h"+#include "svn_string.h"++#ifdef __cplusplus+extern "C" {+#endif /* __cplusplus */+++/** Scan @a instr of length @a len bytes, copying to stringbuf @a *outsbuf,+ * escaping bytes as indicated by the lookup array @a isok and the mapping+ * function @a mapper. Memory is allocated from @a pool.  You may provide+ * any extra information needed by @a mapper in @a mapper_baton.+ * Returns a pointer to the stringbuf containing the escaped string.+ *+ * If @a outsbuf or *outsbuf is NULL, a new stringbuf is created; itsadd
ress is+ * placed in @a outsbuf unless that argument is NULL.+ * If @a isok is NULL, then @a mapper is used exclusively.+ * If @ mapper is NULL, then a single character is escaped every time @a mapper+ * would have been called.+ *+ * This is designed to be the common pathway for various string "escaping"+ * functions across subversion.  The basic approach is to scan+ * the input and decide whether each byte is OK as it stands, needs to be+ * "escaped" using subversion's "?\uuu" default format, or needs to be+ * transformed in some other way.  The decision is made using a two step+ * process, which is designed to handle the simple cases quickly but allow+ * for more complex mappings.  Since the typical string will (we hope)+ * comprise mostly simple cases, this shouldn't require much code+ * complexity or loss of efficiency.  The two steps used are:+ *+ * 1. The value of a byte from the input string ("test byte") is used as an+ *    index into a (usually 256 byte) array passed in by the caller.+ *      - If t
he value of the appropriate array element is 0xff,+ *        then the test byte is escaped as a "?\uuu" string in the output.+ *      - If the value of the appropriate element is otherwise non-zero,+ *        that many bytes are copied verbatim from the input to the output.+ * 2. If the array yields a 0 value, then a mapping function provided by+ *    the caller is used to allow for more complex evaluation.  This function+ *    receives five arguments:+ *      - a pointer to the pointer used by svn__do_char_escape() to+ *        mark the test byte in the input string+ *      - a pointer to the start of the input string+ *      - the length of the input string+ *      - a pointer to the output stringbuf+ *      - the ever-helpful pool.+ * The mapping function may return a (positive) nonzero value,+ * which is interpreted * as described in step 1 above, or zero,+ * indicating that the test byte * should be ignored.  In the latter+ * case, this is generally because the * mapping function has done the+ * necessa
ry work itself; it's free to * modify the output stringbuf and+ * adjust the pointer to the test byte * as it sees fit (within the+ * bounds of the input string).  At a minimum, * it should at least+ * increment the pointer to the test byte before * returning 0, in order+ * to avoid an infinite loop.+ */++svn_stringbuf_t *+svn_subr__escape_string (svn_stringbuf_t **outsbuf,+			 const unsigned char *instr,+			 apr_size_t len,+			 const unsigned char *isok,+			 unsigned char (*mapper) (unsigned char **,+						  const unsigned char *,+						  apr_size_t,+						  const svn_stringbuf_t *,+						  void *,+						  apr_pool_t *),+			 void *mapper_baton,+			 apr_pool_t *pool);++++/** Initializer for a basic screening matrix suitable for use with+ *  #svn_subr__escape_string to escape non-UTF-8 bytes.+ *  We provide this since "UTF-8-safety" is a common denominator for+ *  most string escaping in Subversion, so this matrix makes a good+ *  starting point for more involved schemes.+ */  +#define SVN_ESCAPE_UTF8_LEGAL_A
RRAY { \+  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,  1,   1,\+  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,  1,   1,\+  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,  1,   1,\+  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,  1,   1,\+  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,  1,   1,\+  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,  1,   1,\+  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,  1,   1,\+  1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,   1,  1,   1,\+  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0,\+  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0,\+  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0,\+  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0,\+255, 255,   0,   0,   0,   0,   0,   0,   0,   0,   0,
   0,   0,   0,  0,   0,\+  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0,\+  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  0,   0,\+  0,   0,   0,   0,   0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255}++/** Given pointer @a c into a string which ends at @a e, figure out+ *  whether (*c) starts a valid UTF-8 sequence, and if so, how many bytes+ *  it includes.  Return 255 if it's not valid UTF-8.+ *  For a more detailed description of the encoding rules, see the UTF-8+ *  specification in section 3-9 of the Unicode standard 4.0 (e.g. at+ *  http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf),+ *  with special attention to Table 3-6.+ *  This macro is also provided as a building block for mappers used by+ *  #svn_subr__escape_string that want to check for UTF-8-safety in+ *  addition to other tasks.+ */+#define SVN_ESCAPE_UTF8_MAPPING(c,e)					       \+  ( (c)[0] < 0x80                                  ? /* ASCII */	       \+    1 :				        
             /* OK, 1 byte */	       \+    ( ( ((c)[0] > 0xc2 && (c)[0] < 0xdf)          && /* 2-byte char */	       \+	((c) + 1 <= (e))                          && /* Got 2 bytes */	       \+	((c)[1] >= 0x80 && (c)[1] <= 0xbf))        ? /* Byte 2 legal */        \+      2 :					     /* OK, 2 bytes */         \+      ( ( ((c)[0] >= 0xe0 && (c)[0] <= 0xef)      && /* 3 byte char */	       \+	  ((c) + 2 <= (e))                        && /* Got 3 bytes */	       \+	  ((c)[1] >= 0x80 && (c)[1] <= 0xbf)      && /* Basic byte 2 legal */  \+	  ((c)[2] >= 0x80 && (c)[2] <= 0xbf)      && /* Basic byte 3 legal */  \+	  (!((c)[0] == 0xe0 && (c)[1] < 0xa0))    && /* 0xe0-0x[89]? illegal */\+	  (!((c)[0] == 0xed && (c)[1] > 0x9f)) )   ? /* 0xed-0x[ab]? illegal */\+	3 :                                          /* OK, 3 bytes */	       \+	( ( ((c)[0] >= 0xf0 && (c)[0] <= 0xf4)    && /* 4 byte char */         \+	    ((c) + 3 <= (e))                      && /* Got 4 bytes */         \+	    ((c)[1] >= 0x80 && (c)[1] <= 0xbf) 
   && /* Basic byte 2 legal */  \+	    ((c)[2] >= 0x80 && (c)[2] <= 0xbf)    && /* Basic byte 3 legal */  \+	    ((c)[3] >= 0x80 && (c)[3] <= 0xbf)    && /* Basic byte 4 legal */  \+	    (!((c)[0] == 0xf0 && (c)[1] < 0x90))  && /* 0xf0-0x8? illegal */   \+	    (!((c)[0] == 0xf4 && (c)[1] > 0x8f)) ) ? /* 0xf4-0x[9ab]? illegal*/\+	  4 :					     /* OK, 4 bytes */         \+	  255))))                                    /* Illegal; escape it */+++#ifdef __cplusplus+}+#endif /* __cplusplus */++#endif /* SVN_LIBSVN_SUBR_ESCAPE_IMPL_H */
### Function names can be revised to fit convention, of course. ### svn_utf__cstring_escape_utf8_fuzzy serves as an example of a benefit of### returning the resultant stringbuf from svn_subr__escape_string both in a### parameter and as the function's return value. If the sense is thatit'll be a cause### of debugging headaches, or that it's cortrary to subversionculture to code public### functions as macros, it's easy enough to code this as a function,and to make### svn_subr__escape_string return void (or less likely svn_error_t,if it got pickier### about params.)--- subversion/libsvn_subr/utf_impl.h	(revision 14986)+++ subversion/libsvn_subr/utf_impl.h	(working copy)@@ -24,12 +24,33 @@  #include <apr_pools.h> #include "svn_types.h"+#include "svn_string.h"  #ifdef __cplusplus extern "C" { #endif /* __cplusplus */  +/** Replace any non-UTF-8 characters in @a len byte long string @a src with+ *  escaped representations, placing the result in a stringbuf pointed to by+ *  @a *dest, which will be created if nece
ssary.  Memory is allocated from+ *  @a pool as needed. Returns a pointer to the stringbuf containing the result+ *  (identical to @a *dest, but facilitates chaining calls).+ */+svn_stringbuf_t *+svn_utf__stringbuf_escape_utf8_fuzzy (svn_stringbuf_t **dest,+				      const unsigned char *src,+				      apr_size_t len,+				      apr_pool_t *pool);++/** Replace any non-UTF-8 characters in @a len byte long string @a src with+ *  escaped representations.  Memory is allocated from @a pool as needed.+ *  Returns a pointer to the resulting string.+ */+#define svn_utf__cstring_escape_utf8_fuzzy(src,len,pool) \+  (svn_utf__stringbuf_escape_utf8_fuzzy(NULL,(src),(len),(pool)))->data++const char *svn_utf__cstring_from_utf8_fuzzy (const char *src,                                               apr_pool_t *pool,                                               svn_error_t *(*convert_from_utf8)
### There're other places that could be rewritten in terms of the new escaping### functions, but I hope the two given here serve as an example of how it might### be done.### The rename to ascii_fuzzy_escape is to distinguish it from the new functions### that escape only illegal UTF-8 sequences.--- subversion/libsvn_subr/utf.c	(revision 14986)+++ subversion/libsvn_subr/utf.c	(working copy)@@ -30,6 +30,7 @@ #include "svn_pools.h" #include "svn_ctype.h" #include "svn_utf.h"+#include "escape_impl.h" #include "utf_impl.h" #include "svn_private_config.h" @@ -323,53 +324,19 @@ /* Copy LEN bytes of SRC, converting non-ASCII and zero bytes to ?\nnn    sequences, allocating the result in POOL. */ static const char *-fuzzy_escape (const char *src, apr_size_t len, apr_pool_t *pool)+ascii_fuzzy_escape (const char *src, apr_size_t len, apr_pool_t *pool) {-  const char *src_orig = src, *src_end = src + len;-  apr_size_t new_len = 0;-  char *new;-  const char *new_orig;+  static unsigned char asciinonul[256];+  svn_stringbu
f_t *result = NULL; -  /* First count how big a dest string we'll need. */-  while (src < src_end)-    {-      if (! svn_ctype_isascii (*src) || *src == '\0')-        new_len += 5;  /* 5 slots, for "?\XXX" */-      else-        new_len += 1;  /* one slot for the 7-bit char */+  if (!asciinonul[0]) {+    asciinonul[0] = 255;                   /* NUL's not allowed */+    memset(asciinonul + 1, 1, 127);        /* Other regular ASCII OK */+    memset(asciinonul + 128, 255, 128);    /* High half not allowed */+  }  -      src++;-    }--  /* Allocate that amount. */-  new = apr_palloc (pool, new_len + 1);--  new_orig = new;--  /* And fill it up. */-  while (src_orig < src_end)-    {-      if (! svn_ctype_isascii (*src_orig) || src_orig == '\0')-        {-          /* This is the same format as svn_xml_fuzzy_escape uses, but that-             function escapes different characters.  Please keep in sync!-             ### If we add another fuzzy escape somewhere, we should abstract-             ### this out to a commo
n function. */-          sprintf (new, "?\\%03u", (unsigned char) *src_orig);-          new += 5;-        }-      else-        {-          *new = *src_orig;-          new += 1;-        }--      src_orig++;-    }--  *new = '\0';--  return new_orig;+  svn_subr__escape_string(&result, src, len, asciinonul, NULL, NULL, pool);+  return result->data; }  /* Convert SRC_LENGTH bytes of SRC_DATA in NODE->handle, store the result@@ -448,7 +415,7 @@         errstr = apr_psprintf           (pool, _("Can't convert string from '%s' to '%s':"),            node->frompage, node->topage);-      err = svn_error_create (apr_err, NULL, fuzzy_escape (src_data,+      err = svn_error_create (apr_err, NULL, ascii_fuzzy_escape (src_data,                                                            src_length, pool));       return svn_error_create (apr_err, err, errstr);     }@@ -564,7 +531,28 @@   return SVN_NO_ERROR; } +static unsigned char+utf8_escape_mapper (unsigned char **targ, const unsigned char *start,+		    apr_size_t len, con
st svn_stringbuf_t *dest,+		    void *baton, apr_pool_t *pool)+{+  const unsigned char *end = start + len;+  return SVN_ESCAPE_UTF8_MAPPING(*targ, end);+} +svn_stringbuf_t *+svn_utf__stringbuf_escape_utf8_fuzzy (svn_stringbuf_t **dest,+				      const unsigned char *src,+				      apr_size_t len,+				      apr_pool_t *pool)+{+  static unsigned char utf8screen[256] = SVN_ESCAPE_UTF8_LEGAL_ARRAY;++  return svn_subr__escape_string(dest, src, len,+				 utf8screen, utf8_escape_mapper, NULL,+				 pool);+}+ svn_error_t * svn_utf_stringbuf_to_utf8 (svn_stringbuf_t **dest,                            const svn_stringbuf_t *src,@@ -787,7 +775,7 @@   const char *escaped, *converted;   svn_error_t *err; -  escaped = fuzzy_escape (src, strlen (src), pool);+  escaped = ascii_fuzzy_escape (src, strlen (src), pool);    /* Okay, now we have a *new* UTF-8 string, one that's guaranteed to      contain only 7-bit bytes :-).  Recode to native... */
### With code comes testing.### Note: Contains 8-bit chars, and also uses convention that cc will treat### "foo" "bar" as "foobar".  Both can be avoided if useful forfinicky compilers.
--- subversion/tests/libsvn_subr/utf-test.c	(revision 14986)+++ subversion/tests/libsvn_subr/utf-test.c	(working copy)@@ -17,6 +17,7 @@  */  #include "../svn_test.h"+#include "../../include/svn_utf.h" #include "../../libsvn_subr/utf_impl.h"  /* Random number seed.  Yes, it's global, just pretend you can't see it. */@@ -222,6 +223,84 @@   return SVN_NO_ERROR; } +static svn_error_t *+utf_escape (const char **msg,+	    svn_boolean_t msg_only,+	    svn_test_opts_t *opts,+	    apr_pool_t *pool)+{+  char in[] = { 'A', 'S', 'C', 'I', 'I',     /* All printable */+		'R', 'E', 'T', '\n', 'N',    /* Newline */+		'B', 'E', 'L', 0x07, '!',    /* Control char */+		0xd2, 0xa6, 'O', 'K', '2',   /* 2-byte char, valid */+		0xc0, 0xc3, 'N', 'O', '2',   /* 2-byte char, invalid 1st */+		0x82, 0xc3, 'N', 'O', '2',   /* 2-byte char, invalid 2nd */+		0xe4, 0x87, 0xa0, 'O', 'K',  /* 3-byte char, valid */+		0xe2, 0xff, 0xba, 'N', 'O',  /*3-byte char, invalid 2nd */+		0xe0, 0x87, 0xa0, 'N', 'O',  /*3-byte char, invalid 2nd */+		0xed, 
0xa5, 0xa0, 'N', 'O',  /*3-byte char, invalid 2nd */+		0xe4, 0x87, 0xc0, 'N', 'O',  /* 3-byte char, invalid 3rd */+		0xf2, 0x87, 0xa0, 0xb5, 'Y', /* 4-byte char, valid */+		0xf2, 0xd2, 0xa0, 0xb5, 'Y', /* 4-byte char, invalid 2nd */+		0xf0, 0x87, 0xa0, 0xb5, 'N', /* 4-byte char, invalid 2nd */+		0xf4, 0x97, 0xa0, 0xb5, 'N', /* 4-byte char, invalid 2nd */+		0xf2, 0x87, 0xc3, 0xb5, 'N', /* 4-byte char, invalid 3rd */+		0xf2, 0x87, 0xa0, 0xd5, 'N', /* 4-byte char, invalid 4th */+                0x00 };+  const unsigned char *legalresult =+    "ASCIIRET\nNBEL!$-1(c)ÊOK2?\\192?\\195NO2?\\130?\\195NO2"-A+    "3$-3ıΩ0‰á†1OK?\\226?\\255?\\186NO?\\224?\\135?\\160NO?\\237?\\165?\\160NO"-A+    "?\\228?\\135?\\192NO3$-3ıΩ0Úᆵ1Y?\\242$-1(c)‡?\\181Y?\\240?\\135?\\160"-A+    "?\\181N?\\244?\\151?\\160?\\181N?\\242?\\135ıN?\\242?\\135?\\160"+    "?\\213N";+  const unsigned char *asciiresult =+    "ASCIIRET\nNBEL\x07!?\\210?\\166OK2?\\192?\\195NO2?\\130?\\195NO2"+    "?\\228?\\135?\\160OK?\\2
26?\\255?\\186NO?\\224?\\135?\\160NO"+    "?\\237?\\165?\\160NO?\\228?\\135?\\192NO?\\242?\\135?\\160?\\181Y"+    "?\\242?\\210?\\160?\\181Y?\\240?\\135?\\160?\\181N"+    "?\\244?\\151?\\160?\\181N?\\242?\\135?\\195?\\181N"+    "?\\242?\\135?\\160?\\213N";+  const unsigned char *asciified;+  apr_size_t legalresult_len = 213;  /* == strlen(legalresult) iff no NULs */+  int i = 0;+  svn_stringbuf_t *escaped = NULL;++  *msg = "test utf string escaping";++  if (msg_only)+    return SVN_NO_ERROR;++  if (svn_utf__stringbuf_escape_utf8_fuzzy+      (&escaped, in, sizeof in - 1, pool) != escaped)+    return svn_error_createf+      (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+  i++;+  if (escaped->len != legalresult_len)+    return svn_error_createf+      (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+  i++;+  if (memcmp(escaped->data, legalresult, legalresult_len))+    return svn_error_createf+      (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+  i++;+  if (memcmp(es
caped->data, legalresult, legalresult_len))+    return svn_error_createf+      (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+  i++;++  asciified = svn_utf_cstring_from_utf8_fuzzy(in, pool);+  if (strlen(asciified) != strlen(asciiresult))+    return svn_error_createf+      (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+  i++;+  if (strcmp(asciified, asciiresult))+    return svn_error_createf+      (SVN_ERR_TEST_FAILED, NULL, "UTF-8 escape test %d failed", i);+  i++;++  return SVN_NO_ERROR;+}+  /* The test table.  */ @@ -230,5 +309,6 @@     SVN_TEST_NULL,     SVN_TEST_PASS (utf_validate),     SVN_TEST_PASS (utf_validate2),+    SVN_TEST_PASS (utf_escape),     SVN_TEST_NULL   };
### The original point of this thread.### This patch will apply with an offset, since I've cut out sections which### reimplement XML escaping in terms of the svn_subr__escape_string.--- subversion/libsvn_subr/xml.c	(revision 14986)+++ subversion/libsvn_subr/xml.c	(working copy)@@ -395,11 +413,22 @@   /* If expat choked internally, return its error. */   if (! success)     {+      svn_stringbuf_t *sanitized;+      unsigned char *end;+      +      svn_utf__stringbuf_escape_utf8_fuzzy(&sanitized, buf,+					   (len > 240 ? 240 : len),+					   svn_parser->pool);+      end = sanitized->data ++	    (sanitized->len > 240 ? 240 : sanitized->len);+      while (*end > 0x80 && *end < 0xc0 &&+	     (char *) end > sanitized->data) end--;       err = svn_error_createf         (SVN_ERR_XML_MALFORMED, NULL, -         _("Malformed XML: %s at line %d"),+         _("Malformed XML: %s at line %d; XML starts:\n%.*s"),          XML_ErrorString (XML_GetErrorCode (svn_parser->parser)),-         XML_GetCurrentLineNumber (svn_parser->
parser));+         XML_GetCurrentLineNumber (svn_parser->parser),+	 (char *) end - sanitized->data + 1, sanitized->data);              /* Kill all parsers and return the expat error */       svn_xml_free_parser (svn_parser);
### Finally, be kind to the translators.--- subversion/po/pt_BR.po	(revision 14986)+++ subversion/po/pt_BR.po	(working copy)@@ -6006,8 +6006,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "XML mal formado: %s na linha %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "XML mal formado: %s na linha %d; XML comeÁa:\n%.240s"  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/es.po	(revision 14986)+++ subversion/po/es.po	(working copy)@@ -6102,8 +6102,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "XML malformado: %s en la lÌnea %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "XML malformado: %s en la lÌnea %d; XML comienza:\n%.240s"  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/de.po	(revision 14986)+++ subversion/po/de.po	(working copy)@@ -6090,8 +6090,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "Fehlerhaftes XML: %s in Zei
le %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "Fehlerhaftes XML: %s in Zeile %d; XML beginnt:\n%.240s"  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/sv.po	(revision 14986)+++ subversion/po/sv.po	(working copy)@@ -6005,8 +6005,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "Felaktig XML: %s p rad %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "Felaktig XML: %s p rad %d; XML starta:\n%.240s"  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/ko.po	(revision 14986)+++ subversion/po/ko.po	(working copy)@@ -5906,8 +5906,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "3$-3ıΩ0Ïûò13ıΩ0Ιª13ıΩ0Îêú1 XML: %s (3ıΩ0ϧÑ13ıΩ0Î≤à13ıΩ0Ìò∏1 %d)"-A+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "3$-3ıΩ0Ïûò13ıΩ0Ιª13ıΩ0Îêú1 XML: %s (3ıΩ0ϧÑ13ıΩ0Î≤à13ıΩ0Ìò
1 %d); XML:\n%.240s"-A  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/ja.po	(revision 14986)+++ subversion/po/ja.po	(working copy)@@ -6463,8 +6463,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "3$-3ıΩ0Áï∞13ıΩ0Â∏∏1$-2æ  XML æ«æπ: %s (3$-3ıΩ0˰å1 %d)"-A+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "3$-3ıΩ0Áï∞13ıΩ0Â∏∏1$-2æ  XML æ«æπ: %s(3$-3ıΩ0˰å1 %d); XML:\n%.240s"-A  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/pl.po	(revision 14986)+++ subversion/po/pl.po	(working copy)@@ -6103,8 +6103,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "Uszkodzony XML: %s w linii %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "Uszkodzony XML: %s w linii %d; XML wersja:\n%.240s"  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/zh_TW.po	(revision 14986)+++ subversion/po/zh_TW.po	(working copy)@
@ -5896,8 +5896,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "3$-3ıΩ0Êúâ13ıΩ0Áº∫13ıΩ0Èô∑1 XML: %s3ıΩ0Êñº13ıΩ0Á¨¨1 %d 3ıΩ0Âàó1"-A+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "3$-3ıΩ0Êúâ13ıΩ0Áº∫13ıΩ0Èô∑1 XML: %s3ıΩ0Êñº13ıΩ0Á¨¨1 %d 3ıΩ0Âàó1; XML:\n%.240s"-A  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/nb.po	(revision 14986)+++ subversion/po/nb.po	(working copy)@@ -5995,8 +5995,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msgstr "Misdannet XML: %s i linje %d"+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "Misdannet XML: %s i linje %d; XML starter:\n%.240s"  #: libsvn_wc/adm_crawler.c:380 #, c-format--- subversion/po/zh_CN.po	(revision 14986)+++ subversion/po/zh_CN.po	(working copy)@@ -5955,8 +5955,8 @@  #: libsvn_subr/xml.c:400 #, c-format-msgid "Malformed XML: %s at line %d"-msg
str "3$-3ıΩ0Áï∏13ıΩ0ÂΩ¢13ıΩ0ÁöÑ1XMLÚ˙%s3ıΩ0Âú(r)13ıΩ0Á¨¨1 %d 3ıΩ0˰å1"-A+msgid "Malformed XML: %s at line %d; XML starts:\n%.240s"+msgstr "3$-3ıΩ0Áï∏13ıΩ0ÂΩ¢13ıΩ0ÁöÑ1XMLÚ˙%s3ıΩ0Âú(r)13ıΩ0Á¨¨1 %d 3ıΩ0˰å1; XML:\n%.240s"-A  #: libsvn_wc/adm_crawler.c:380 #, c-format
### End of patch ###
-- Regards,Charles BaileyLists: bailey _dot_ charles _at_ gmail _dot_ comOther: bailey _at_ newman _dot_ upenn _dot_ edu
Received on Mon Jun  6 18:23:43 2005