[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r917523 - mod_dav_svn/mirror.c - Are the URIs URI-encoded?

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Wed, 17 Mar 2010 15:57:17 +0000

On Mon, 2010-03-01, kameshj_at_apache.org wrote:
> Author: kameshj
> Date: Mon Mar 1 13:48:01 2010
> New Revision: 917523
>
> URL: http://svn.apache.org/viewvc?rev=917523&view=rev
> Log:
> With the below apache configuration(See the <space> character "/svn 1/").
>
> <Location "/svn 1/">
[...]
> Write through proxy is *not* happening and commit happens *directly* inside the slave.

Hi Kamesh.

Today I have studied this change thoroughly, and it appears to me that
it changes the interpretation of the URL given in the <Location>
directive (and the SVNMasterURI directive) in a way that is probably
undesirable.

Previously, the URL was assumed to be URI-encoded, and thus this one
with a space in it did not work because it did not match the canonical
equivalent ("/svn%201/"). That caused the "bug" that you were trying to
fix, I believe.

After this patch, it appears to me that the URL given in the <Location>
directive (and the one in the SVNMasterURI directive) is assumed to be
NOT encoded, and mod_dav_svn will encode it internally.

The new behaviour will be wrong for any Location URL that is already
written with %XX encoding, because it will encode it again.

If we agree that the URL should be assumed already to be URI-encoded (at
least to the extent necessary to make "%" characters unambiguous), I
think there are two possible solutions:

  1) Tell the user to use proper (canonically URI-encoded) URIs.

  2) Assume the URI is already in (partly) URI-encoded form but
re-encode it into canonical URI-encoded form, encoding characters such
as SPACE but not re-encoding the "%" character, and also unencode any
characters that should not be encoded in canonical URI-encoded form.

(1) is rather unfriendly.

More follows ...

> * subversion/mod_dav_svn/mirror.c
> (proxy_request_fixup): URI encode the to be proxied file name.
> (dav_svn__proxy_request_fixup): r->unparsed_uri is in url encoded form while
> root_dir is not in encoded form. So use r->uri to compare with root_dir.
> (dav_svn__location_in_filter): URL Encode the 'find & replace' urls as
> the request body has it in url encoded format.
> (dav_svn__location_header_filter): Encode the master_uri as the response from
> master has the Location header url encoded already. Set the outgoing Location
> header url encoded.
> (dav_svn__location_body_filter): URL Encode the 'find & replace' urls as
> the response body has it in url encoded format.
>
> Modified:
> subversion/trunk/subversion/mod_dav_svn/mirror.c
>
> Modified: subversion/trunk/subversion/mod_dav_svn/mirror.c
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/mod_dav_svn/mirror.c?rev=917523&r1=917522&r2=917523&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/mod_dav_svn/mirror.c (original)
> +++ subversion/trunk/subversion/mod_dav_svn/mirror.c Mon Mar 1 13:48:01 2010
> @@ -45,8 +45,10 @@
>
> r->proxyreq = PROXYREQ_REVERSE;
> r->uri = r->unparsed_uri;
> - r->filename = apr_pstrcat(r->pool, "proxy:", master_uri,
> - uri_segment, NULL);
> + r->filename = (char *) svn_path_uri_encode(apr_pstrcat(r->pool, "proxy:",
> + master_uri,
> + uri_segment,
> + NULL), r->pool);

That looks odd. It may be functionally correct, but normally I would
assume that any parameters with names like "master_uri" and
"uri_segment" would be already URI-encoded. (If so, then encoding them
again would be wrong.) Don't we want to keep URI data in URI-encoded
form most of the time?

In this case, it appears that you have modified this function and the
caller so that the arguments passed are NOT URI-encoded. Looking at
where the "uri_segment" comes from ...

> r->handler = "proxy-server";
> ap_add_output_filter("LocationRewrite", NULL, r, r->connection);
> ap_add_output_filter("ReposRewrite", NULL, r, r->connection);
> @@ -78,7 +80,7 @@
> transaction tree resouces. */
> if (r->method_number == M_PROPFIND ||
> r->method_number == M_GET) {
> - if ((seg = ap_strstr(r->unparsed_uri, root_dir))) {
> + if ((seg = ap_strstr(r->uri, root_dir))) {

"r->uri" is documented as "The path portion of the URI." It appears
from this commit that you have discovererd that "r->uri" is not
URI-encoded, whereas r->unparsed_uri (which is the full URL) is
URI-encoded. OK so far.

I went looking to see where the "master_uri" comes from, and it comes
from dav_svn__get_master_uri() which gets its value directly from the
Apache configuration directive "SVNMasterURI". It is not clear to me
whether we should expect that to be URI-encoded, but I would have
thought so.

To clarify that, we need first to decide what it is meant to be, and
then to document dav_svn__get_master_uri() and the other functions in
that group. I attach a starting-point patch for the latter.

> if (ap_strstr_c(seg, apr_pstrcat(r->pool, special_uri,
> "/wrk/", NULL))
> || ap_strstr_c(seg, apr_pstrcat(r->pool, special_uri,
> @@ -95,7 +97,7 @@
> /* If this is a write request aimed at a public URI (such as
> MERGE, LOCK, UNLOCK, etc.) or any as-yet-unhandled request
> using a "special URI", we have to doctor it a bit for proxying. */
> - seg = ap_strstr(r->unparsed_uri, root_dir);
> + seg = ap_strstr(r->uri, root_dir);
> if (seg && (r->method_number == M_MERGE ||
> r->method_number == M_LOCK ||
> r->method_number == M_UNLOCK ||
> @@ -158,6 +160,11 @@
> ### PUT requests and properties in PROPPATCH requests.
> ### See issue #3445 for details. */
>
> + /* We are url encoding the current url and the master url
> + as incoming(from client) request body has it encoded already. */
> + canonicalized_uri = (char *) svn_path_uri_encode(canonicalized_uri,
> + r->pool);
> + root_dir = (char *) svn_path_uri_encode(root_dir, r->pool);

In a separate discussion I think we decided that we will need to define
"canonical" as including "in a canonical URI-encoded form". When we do
that, uri_canonicalize() will (I think) require that its input is
already URI-encoded, so we will need to call uri_encode() before
uri_canonicalize(). It would be good to make the calls in that order
now, but we will have to audit all calls when we make that change anyway
so it is not essential right now.

Are the (char *) type casts necessary? If not, please remove them.
(Six of them in total.)

Thanks.
- Julian

Received on 2010-03-17 16:57:51 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.