[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message

From: Charles Bailey <bailey.charles_at_gmail.com>
Date: 2005-06-22 17:21:11 CEST

On 6/21/05, Philip Martin <philip@codematters.co.uk> wrote:
> Charles Bailey <bailey.charles@gmail.com> writes:
> > On 6/20/05, Philip Martin <philip@codematters.co.uk> wrote:
> >> Charles Bailey <bailey.charles@gmail.com> writes:
> >>
> >> > --- /dev/null Mon Jun 6 11:06:27 2005
> >> > +++ subversion/libsvn_subr/escape.c Fri Jun 3 19:16:09 2005
> >>
> >> New files need a copyright notice.
> >
> > Right. I'd noticed that, but since I don't know Collabnet's policy
> > here I didn't want
> > to presume. Is it acceptable to clone the copyright from an existing
> > file? Does
> > that suffice to Collabnet as assignment of copyright?
> I think so.

OK, will do.

> > Do I have a pledge from
> > Collabnet that any code whose copyright is assigned in this way will
> > remain freely
> > available in perpetuity? (By "freely available", I mean terms
> > substantially identical
> > to Subversion's current licensing; I'm not trying to start a skirmish
> > over software
> > licensing.)
> I can't speak for CollabNet, but I'll point out that the Subversion
> licence is not a copyleft licence and so anyone, including CollabNet,
> can make a proprietary version.

I've no problems with that. I would have problems with CollabNet
asserting next Tuesday that Subversion was (t)henceforth a proprietary
product, and any further use would require payment of a license fee.
I assume that's not the case (and it would arguably be hard to do in
any event), but thought it prudent to ask, since I'm new to this

> >> Those big macros also make me uneasy.
> >
> > Is the concern that it'll choke come compilers, or that they're hard
> > to maintain?
> I don't think compilers will have a problem, I think it makes it
> harder for humans.

OK. Paradoxically, I'm trying to make it easier for humans, but
letting them just use a "function call" to perform a common task,
without incurring the runtime overhead of the function call.

> > This may just be a stylistic issue: I tend to favor macros for
> > repeated tasks, on the theory that as long as one debugs it carefully,
> > the benefit of inlining is worth the effort of coding.
> The macros are only used once in your patch...

Yep. I noted in the comments that I'd included them as macros because
they seemed to be fairly common operations that might come up in other
contexts. If that's not the case, I can just inline them and other
folks can copy-paste or reimplement them as needed.

> >> > + if (!asciinonul[0]) {
> >> > + asciinonul[0] = 255; /* NUL's not allowed */
> >>
> >> That doesn't look threadsafe.
> >
> > It is safe, though it doesn't prevent duplication.
> No, it's not safe. Consider:
> - asciinonul is static initialised to zero
> - thread 1 tests asciinonul[0] and gets zero
> - thread 1 sets asciinonul[0] to 255, the rest of asciinonul is still zero
> - thread 2 tests asciinonul[0] and gets 255
> - thread 2 uses the rest of asciinonul although it is still zero, oops!

Good point; I should've spotted that. I'd been sitting on the fence
between this approach (runtime initialization, slower but taking less
space in the source) and static initialization, but looks like the
latter is definitely the way to go here.

> >> Overall this patch makes me uneasy, I'd prefer a patch that builds on
> >> our existing ctype and/or utf-8 code. However as I haven't tried to
> >> implement such a patch I don't know whether our existing code is
> >> a suitable starting point.
> >
> > That's certainly possible. I'd actually started out that way, and
> > switched to the above when it seemed I was about to add ya parallel
> > routine to escape a string in a slightly different way.
> >
> > Thanks for the review. If you or one of the other core developers
> > could answer some of the "philosophic" questions above, I can revise
> > the patch appropriately.
> Well, that's a bit tricky. Your patch looked overly complicated to
> me, but then I haven't tried to write an alternative so I don't know
> whether there is a better way.

My sense is that there is a simpler approach to the specific task of
making a string "UTF8-safe", but when I also consider related tasks
like escaping XML (single bytes escaped to entities like & => &amp;)
or potentially converting multibyte codes to alternative multibyte
codes, I can't whittle it down much further.

I think this approach is a net win because of reduction in parallel
code and simplification of maintenance, but that's an opinion rather
than verifiable fact. It may produce some small savings in space,
though that's arguable given the use of 256-byte screening arrays.
It's amost certainly not faster than a set of optimized custom
routines, though it's probably not much slower, either; the specifics
will likely depend on compiler and environment. Whether, in the end,
the balance favors this "common pathway" approach or custom functions
for each task is the crucial question here, I guess.

On a practical level, how should I proceed? Is it worth my making the
minor fixes you suggest and posting another patch for further
discussion, or would it be better to let it drop and see whether any
other opinions are forthcoming?


Charles Bailey
Lists: bailey _dot_ charles _at_ gmail _dot_ com
Other: bailey _at_ newman _dot_ upenn _dot_ edu
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jun 22 17:22:25 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.