[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Include offending XML in "Malformed XML" error message

From: Charles Bailey <bailey.charles_at_gmail.com>
Date: 2005-02-28 03:58:48 CET

On Sun, 27 Feb 2005 22:25:11 +0100 (CET), Peter N. Lundblad
<peter@famlundblad.se> wrote:
> On Sat, 26 Feb 2005, Charles Bailey wrote:
>
> > On Thu, 24 Feb 2005 14:17:20 +0100 (CET), Peter N. Lundblad
> > <peter@famlundblad.se> wrote:
> > > On Wed, 23 Feb 2005, Charles Bailey wrote:
> > >
> > A revised patch is appended below my sig, with appropriate attention
> > to l10n. I've taken idiom from other messages where I can (z.B.
> > 'beginnen' v. 'anfingen'). More fluent readers may want to pitch in
> > on the middle European and Scandinavian languages, where my morphology
> > may be laughably in error, and I've punted almost entirely on the east
> > Asian languages, but I hope it'll serve as a decent start nonetheless.
> >
> You needn't bother with the .po files in a code patch. The translators
> will take care of that. (We generally don't expect patch submitters to
> provide translations of all 10 languages:-)

I'd assumed that to be the case generally, but if I can save them some
work, so much the better.

> > Index: subversion/libsvn_subr/xml.c
> > ===================================================================
> > --- subversion/libsvn_subr/xml.c (revision 13160)
> > +++ subversion/libsvn_subr/xml.c (working copy)
> > @@ -417,9 +417,9 @@
> > {
> > err = svn_error_createf
> > (SVN_ERR_XML_MALFORMED, NULL,
> > - _("Malformed XML: %s at line %d"),
> > + _("Malformed XML: %s at line %d; XML starts:\n%.240s"),
> > XML_ErrorString (XML_GetErrorCode (svn_parser->parser)),
> > - XML_GetCurrentLineNumber (svn_parser->parser));
> > + XML_GetCurrentLineNumber (svn_parser->parser), buf);
> >
> I discussed this with Erik Heulsmann on IRC and our conclusion is that we
> need to be mroe careful with encodings here. As I stated earlier, we have
> no guarantee that this string will be valid UTF8 (and we can't know from
> the error either, since the encoding error might be after the error being
> reported). We have code in utf_valid.c to decide if a string is valid
> UTF8. You need to use this. We have code in utf. (fuzzy_escape) to fuzzily
> escape non-ASCII characters in a string. Export this function as an
> internal "API" (svn_utf__fuzzy_escape) and use it if the part of the
> string is invalid UTF8. To make it even more complicated, you should
> ensure that you cut the string on an UTF8 boundary. There is code for this
> in svn_ctype.h. Make sure you break the string before a LEAD character or
> ASCII character. This will avoid unnecessary escaping.

This looks like a SMOP, but I'm not sure what the real benefit is.
Does Subversion make the guarantee that all of its output will be
UTF-8? Unless that's a major goal, I think there's something to be
said for inserting the offending XML "as is" into the error message.
If it is a general policy to convert to UTF-8, should I code this as a
separate function, rather than putting the logic into parse_xml?

> As so often, a patch turns out to be a little more work when anticipated
> at the first glance... But, please continue. It will help us much in the
> future.

Sure, if it's helpful. I just want to make sure I'm not
overengineering the solution to a small problem.

--
Regards,
Charles Bailey
Lists: bailey _dot_ charles _at_ gmail _dot_ com
Other: bailey _at_ newman _dot_ upenn _dot_ edu
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Feb 28 03:59:56 2005

This is an archived mail posted to the Subversion Dev mailing list.