On Mon, 28 Feb 2005, Charles Bailey wrote:
> On Mon, 28 Feb 2005 08:11:33 +0100 (CET), Peter N. Lundblad
> <peter@famlundblad.se> wrote:
> > > If it is a general policy to convert to UTF-8, should I code this as a
> > > separate function, rather than putting the logic into parse_xml?
> > >
> > You can put it in a separate function. Keep it internal to the file,
> > though, until we see another use case for it. We can export it if that
> > happens.
>
> As a first pass, the '%s' token appears in 410 error messages in HEAD.
> On rapid inspection, about half appear to take internal strings,
> which I would think are more likely to be valid UTF-8 (though the
> error motivating my original patch was an internal string, so that
> should be taken with the proverbial grain of salt). The others are
> most often user input, so may or may not be valid UTF-8 depending on
> the user's locale settings; I haven't traced code to see how often
> they're already escaped. How much of this needs coverage, though,
> looks to me like a question for the longer term and for more
> experienced svn hands than I.
>
YOu misunerstood me. I didn't mean we need to escape in general (we
already does in the cmdline output routines to be safe). User input is
converted from the native encoding to UTF8 rather early, so normally we
rely on strings being valid UTF8. This is a special case, since, if I
understand correctly, it is raw XML from the parser. We rely on the parser
doing the recoding to UTF( for us, but since this is an error situation,
the data might not be valid UTF8. That's why we need this ugly escaping in
this case (and when reporting recoding errors in utf.c).
Regards,
//Peter
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Feb 28 22:38:15 2005