[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn log causes XML parse error (internal error)

From: Ben Reser <ben_at_reser.org>
Date: 2004-07-21 19:52:37 CEST

On Wed, Jul 21, 2004 at 09:45:48AM -0500, kfogel@collab.net wrote:
> Now we're talking about "/" being disallowed in property names. Why
> is it disallowed? I'm not saying there should be no limitations on
> property names -- newlines and other control chars seem like a bad
> idea, for example. But "/"? What's wrong with "/"?

Remember that a / is a reserved character in XML element names and that
properties over DAV are named in XML elements. This is a horrible
failing of the DAV spec, but it's what it is.

At any rate here's where we stand with property names:

Right now libsvn_client restricts the valid property names in the
is_valid_prop_name() static function. I think it's clear that this
function is implementing the Name syntax as specified in the XML
specification:
http://www.w3.org/TR/2004/REC-xml-20040204/#sec-common-syn

Issue #1807 exists because : while not technically reserved is somewhat
reserved for use by XML namespaces. Unfortunately the parser on the
Apache end was being too strict and rejecting XML that contained the
character outside the context of the use of a namespace. It's arguable
in my reading of the specification if this is legal or not. While, the
specification says that the parser must allow colons as name characters
it doesn't really say that to do with documents using namespaces and
that have element names with colons in them that aren't part of the
namespace use.

Of course the XML Namespace Recommendation redefines the space of the
valid element names to disallow the colon, so the Apache XML parsers
error is probably correct:
http://www.w3.org/TR/REC-xml-names/#ns-decl

Turning more specifically to DAV. RFC 2518 defines property names based
on XML namespace valid names. Which means if we're following the DAV
specification we're technically in violation becuase we're permitting
property names with colons.
http://www.webdav.org/specs/rfc2518.htm#rfc.section.4.5

Unfortunately, we probably encouraged the use of colon named properties
with our own svn: prefixed properties. It probably would have been much
better to use svn.log instead. But that's water under the bridge.

> If the issue is that a particular transport layer (DAV) makes it
> difficult to allow certain otherwise-desirable chars, then we have a
> bug in our transport layer, and should treat it as such. Now, maybe
> we need to disallow those chars in propset for a while, until we fix
> the bug. But we shouldn't be treating this as a permanent and
> acceptable state of affairs. The property namespace is Subversion's;
> the fact that we use XML or XML-based protocols is an implementation
> detail that shouldn't be leaking out into userland.

These characters were already disallowed by propset. The only reason
this issue came up is because SVN::Mirror doesn't use the client libs or
the ra layer libs. It uses the repos and fs libraries directly. As a
result it could place property names that the client libs would
disallow.

If we're going to disallow clients from setting property names then we
should disallow it all the way through the API. My intention is to do
that and to make is_valid_prop_name() a public function (though renamed)
so that anyone can tell if they have a valid property name.

As far as this being DAV specific. I presume that DAV is where we got
the whole concept of properties and our properties are designed around
that specification. I don't see what's so bad about having some
limitations on property names. So long as those limitations are spelled
out. Which I think we've done a poor job of doing.

> Is there some more subtle issue here? Is it that we need to limit
> these properties because DAV has its own idea of properties and we
> want to remain compatible with DAV clients? If so, perhaps we should
> be thinking of a compatibility strategy other than "Limit everyone to
> whatever DAV can handle."

Well this was already done a long time ago. I could easily open up the
available number of characters in property names by using a namespace
hack. However, this is competely incompatable with our existing
clients. Plus it's a total pain in the butt to try and negotiate this
with clients. So if we're going to do this it's a 2.0 thing in my mind.

But for now, we need to follow what we've got. The acceptance by the
lower level layers of property names not accepted by the client layers
is a bug IMHO.

But if you really want to talk about the more subtle issue... The more
subtle issue is a lack of thought into what constitutes valid syntaxes
for a variety of things. As a result we're not enforcing these syntaxes
and as a result people get really strange behavior when they get outside
of what people normally use.

Pathnames and property names are good examples that we've already run
into. There are a number of reasons why we should define these things
clearly:

* Helps people adopting the software know our limitations.
* Lets anyone planning on implementing another RA layer or FS layer or
anything else know what they have to support and don't have to support.
* Helps us write better tests to look for bugs.
* Allows people writing code to better take into account the
consequences of their code. Some implementations of network and fs
layers will by necessity apply special meaning to characters. They
better make sure that they escape or handle such characters that are
allowed to avoid potential security issues for their special meanings.

We've also got issues like this in the svn protocol. Just how long is a
valid string? There's absolutely no definition. So theoretically we
have to allow any amount. Which ended up causing security implicatons
for us.

The length of what a property value is allowed to be played into this.
It's not specified either. So once again we had to just stick our heads
in the sand and support an unlimited amount.

I don't think you can ever support every imaginable naming or length
that a user might want. But we have to pick something and then follow
it. If we want to be fairly unrestrictive then we need to say that so
implementors know to expect unusual characters.

-- 
Ben Reser <ben@reser.org>
http://ben.reser.org
"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Jul 21 19:53:24 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.