[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Include offending XML in "Malformed XML" error message

From: Charles Bailey <bailey.charles_at_gmail.com>
Date: 2005-03-01 00:14:06 CET

On Mon, 28 Feb 2005 22:39:13 +0100 (CET), Peter N. Lundblad
<peter@famlundblad.se> wrote:
> On Mon, 28 Feb 2005, Charles Bailey wrote:
> > As a first pass, the '%s' token appears in 410 error messages in HEAD.
> > On rapid inspection, about half appear to take internal strings,
> > which I would think are more likely to be valid UTF-8 (though the
> > error motivating my original patch was an internal string, so that
> > should be taken with the proverbial grain of salt). The others are
> > most often user input, so may or may not be valid UTF-8 depending on
> > the user's locale settings; I haven't traced code to see how often
> > they're already escaped. How much of this needs coverage, though,
> > looks to me like a question for the longer term and for more
> > experienced svn hands than I.
> >
> YOu misunerstood me. I didn't mean we need to escape in general (we
> already does in the cmdline output routines to be safe). User input is
> converted from the native encoding to UTF8 rather early, so normally we
> rely on strings being valid UTF8. This is a special case, since, if I
> understand correctly, it is raw XML from the parser. We rely on the parser
> doing the recoding to UTF( for us, but since this is an error situation,
> the data might not be valid UTF8. That's why we need this ugly escaping in
> this case (and when reporting recoding errors in utf.c).

Fair enough. This is where my brief experience with svn limits me.
Depending on how "UTF-8-safe" svn needs to be, my point may still
apply to any data read from a file, however. I think that would
include not only XML from admin files, but property names and values,
and any fragments from base or working revisions. This protects
against direct edits to the files, as well as errors elsewhere in svn
(as was the case with the offending XML here).

Mind you, I'm not advocating this -- I think it's a lot of work to
guarantee that a non-UTF-8 character is never presented to an external
library or to the user. I'll work on the XML parse error as a special
case, and leave the broader policy decisions for a later time.

Charles Bailey
Lists: bailey _dot_ charles _at_ gmail _dot_ com
Other: bailey _at_ newman _dot_ upenn _dot_ edu
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Mar 1 00:15:23 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.