--On Jun 6, 2005 12:34 PM, Michael W Thelen <mike@pietdepsi.com> wrote:
>
> It looks like something bad happened to your message, as if something
> stripped out all the newlines. Would you mind resending the patch?
Sorry; I think I've been Gmailed again; let me try from Mulberry. Here's
the leading text:
On Wed, 23 Feb 2005, Charles Bailey wrote:
>
> Attached is a patchlet that, when expat fails to parse an hunk of XML,
> appends at least part of the offending hunk to the error message. It
which led to an exchange regarding the need to make strings
UTF-8-safe. After too long a haitus, I posted for comment:
On 4/21/05, Charles Bailey <bailey.charles@gmail.com> wrote:
>
> Well, after umpteen interrupts from the rest of life,I finally got a
> few hours to look at this again. In checking was was already
> available, I found a handful of "string escaping" function in various
> places which perform similar tasks (at least one with the comment
> "this should share code with other_string_escaping_routine()"). Since
> I'd have to add ya such function, I thought I'd try to abstract it a
> bit, with the hope that similar routines could use a common base.
> I've appended a short proposal at the bottom of this messages,
> containing a common "engine" and an example implementation for
> creating a UTF-8-safe version of an arbitrary string.
Julian Foad was kind enough to point out a dumb thinko, but no other
comments were forthcoming, possibly because the core developers were
busy with pre-1.2 cleanup.
So, after another too-long hiatus, here's a patch which implements a
"common" string escaping function , uses it for UTF-8 escaping, and
uses that to sanitize the offending XML, which is then output in the
error message that Jack built^W^Wstarted this thread.
I've interspersed my comments in the code, since there's imho zero
chance that this version of the patch will be
substantially/stylistically suitable for committing. They're far from
exhaustive, but this message is long enough already.
Conceptual "Log message":
[[[
Add function that escapes illegal UTF-8 characters, along the way
refactoring core of
string-escaping routines, and insure that illegal XML error message
outputs legal UTF-8.
### Probably best applied as several patches, but collected here for review.
* subversion/libsvn_subr/escape.c:
New file
(svn_subr__escape_string): Final-common-path function for escaping
strings.
* subversion/libsvn_subr/escape_impl.h:
New file, declaring svn_subr__escape_string and convenience macros.
### Logical candidate for consolidation with utf_impl.h, perhaps as
subr_impl.h
* subversion/libsvn_subr/utf.c:
(fuzzy_escape): Renamed to ascii_fuzzy_escape, and rewritten to use
svn_subr__escape_string.
(svn_utf__stringbuf_escape_utf8_fuzzy): New function which escapes
illegal
UTF-8 in a string, returning the escaped string in a stringbuf.
(utf8_escape_mapper): Helper function for
svn_utf__stringbuf_escape_utf8_fuzzy.
* subversion/libsvn_subr/utf_impl.h:
Add prototype for svn_utf__stringbuf_escape_utf8_fuzzy.
(svn_utf__cstring_escape_utf8_fuzzy): Macro implementing variant
of above that
returns NUL-terminated string.
* subversion/libsvn_subr/xml.c:
(svn_xml_parse): If parse fails, print (sanitized) (part of) offending
XML
with error message.
* subversion/tests/libsvn_subr/utf-test.c:
(utf_escape): New function testing UTF-8 string-escaping functions.
* subversion/po/de.po, subversion/po/es.po, subversion/po/ja.po,
subversion/po/ko.po, subversion/po/nb.po, subversion/po/pl.po,
subversion/po/pt_BR.po, subversion/po/sv.po,
subversion/po/zh_CN.po, subversion/po/zh_TW.po:
Courtesy to translators, since I've changed a localized string.
]]]
The patch, with interspersed comments, is appended as an attachment.
--
Regards,
Charles Bailey < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jun 7 19:00:05 2005