[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Issue #1901 (double slashes screw things): Patch, and some strategy questions

From: Branko Čibej <brane_at_xbc.nu>
Date: 2004-07-04 03:25:31 CEST

Ben Reser wrote:

>On Sat, Jul 03, 2004 at 07:28:08AM -0400, Josh Pieper wrote:
>
>
>>Yes, this is a symptom of a larger problem, namely that
>>svn_path_canonicalize doesn't return a canonicalized path. Canonical
>>would mean that any two input paths that referenced the same file
>>should have the same output path (at least relative to the same root).
>>svn_path_canonicalize should really behave more like
>>svn_path_get_absolute, except without the requirement that the file or
>>path actually exist on disk. I'm working on it now.
>>
>>It seems that in addition to this problem, there may also be
>>sub-commands that don't canonicalize their paths/URLs before using
>>them. I'll see what can be done about that too afterwards.
>>
>>
>
>Before we can really fix this we need to answer the following questions:
>
>a) What constitutes a "canonical" path?
>
>b) At what level does the API require a "canonical" path?
>
>c) At what level is the API required to produce a "canonical" path?
>
>Without these answers, we can't really fix this.
>
>Here are my answers:
>
>a) Right now we're using the following definition:
> Is not ".", Does not end in "/".
>
> If we're going to use the same rules for canonicalizing URLs we need
> keep a few things in mind. It is up to the server, how to interpret
> the path portion of the URL. In our case we have two servers that
> we have to mainly worry about and then file:/// which is interprted
> by the OS.
>
>
I don't think that's the case. I believe hierarchical URL schemas (all
of ours, http[s]://, svn[+...]:// and file:// are such) pretty much
define how the path part should look like. And the path portion of the
file:// (not file:///!) URLs is _not_ interpreted by the OS. It's
converted _to_ an OS-specific path -- it's merely a coincidence that
everything after the second slash happens to look like a canonical path
on Unix (well... not really a coincidence, but let's not split hairs).

> We know that generally all of them interpret the following things in
> special ways:
> //
> /./
> /../
>
> We also know that // may in some cases, though probably rare a
> different path in one of our servers (Apache).
>
>
As far as I know, double slashes aren't allowed in the path part of URLs.

> However, trying to permit // looks like it's going to end up being a
> real hassle. Using a path with // would already be problematic, and
> we haven't seen any users complaining because their paths with // in
> them don't work. (By this I mean a path where
> http://www.example.com/whatever//foo differs from
> http://www.example.com/whatever/foo in what the server returns, not
> people using // on accident as the issues we're talking with deal
> with.
>
> As a result I'm inclined to think (despite my previous objections)
> that a canonical path does not contain the following things:
> Is not equal to "."
> Does not end in "/"
> Does not contain "//", "/./" or "/../".
>
>
I tend to agree, except that "canonical" and "absolute" aren't the same,
so a canonical path _can_ start with a series of "../" sequences.

>b) Right now hardly any of our APIs document if they want a canonical
> URL or not. For the most part the svn_path_* commands require a
> canonical form, with the exception of svn_path_canonicalize and
> svn_path_internal_style. svn_client_import() also documents that
> paths need not be in canonical form. Other than that it's not
> documented.
>
>
I think it's pretty much obvious that all our APIs, except the two you
mention, require a canonical path (or URL) encoded in UTF-8. I'd
actually go as far as to require the coding to be in Normalized Form C,
for consistency -- not that we can guarantee that even in our own client
programs right now.

> This ends up creating the problems like we're seeing reported in this
> bug. Ultimately this bug is a result of the ra lib assuming it has a
> canonical path and passing that into svn_path_join, which also
> assumes it has a canonical path, and as a result ends up returning
> a path that is not canonical and fails the assertion, even though
> it is documented as always returning a canonical path.
>
> I believe any library below svn_client is already assuming that it is
> receiving canonical paths. Therefor, my answer is svn_client and
> the two functions in svn_path are the only APIs that should accept
> non-canonical paths.
>
>c) I tend to think that all APIs should produce canonical paths. If we
> don't then we'll run into situations where someone doesn't realize
> that some API produced a non-canonical path and use it with something
> that requires one.
>
>
"Principle of least surprise" here. Ah, except svn_path_local_style. :-)

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Jul 4 03:26:10 2004

This is an archived mail posted to the Subversion Dev mailing list.