Re: path canonicalization problem (issue #559)

From: Garrett Rooney <rooneg_at_electricjellyfish.net>
Date: 2002-01-21 23:54:28 CET

On Mon, Jan 21, 2002 at 10:44:01PM +0000, Philip Martin wrote:
> Garrett Rooney <rooneg@electricjellyfish.net> writes:
>
> > --- subversion/libsvn_subr/path.c
> > +++ subversion/libsvn_subr/path.c Mon Jan 21 14:27:21 2002
> > @@ -85,11 +85,40 @@
> > void
> > svn_path_canonicalize (svn_stringbuf_t *path)
> > {
> > - /* At some point this could eliminate redundant components.
> > - For now, it just makes sure there is no trailing slash. */
> > + char *start_here = NULL;
> > + char *tmp, *tmp1;
> >
> > - /* kff todo: maybe should be implemented with a new routine in
> > - libsvn_string. */
> > + /* if this is a url, we want to start condensing at the third slash,
> > + otherwise we start at the beginning of the path */
> > + if (svn_path_is_url ((const svn_string_t *)path))
>
> This cast relies on the members of svn_string_t being ordered the same
> as svn_stringbuf_t. It's true at present, but is it guaranteed?

that's one of the things i was wondering. if it is guaranteed, it
would be quite useful, as one can do what i just did, and pass
stringbufs to functions that take const string_t *'s. if this is
guaranteed, we should probably document it though.

> > + {
> > + int i, num_slashes = 0;
> > +
> > + for (i = 0; i < path->len; ++i)
> > + if (path->data[i] == '/')
>
> Should SVN_PATH_LOCAL_SEPARATOR or SVN_PATH_SEPARATOR be used
> here?

again, that's something i was wondering about. in this case, it's a
url, so we know it's a slash. can i assume those #define's are a
slash as well? doesn't that sort of defeat the purpose of having the
#define?

> > + if (++num_slashes == 3)
> > + start_here = &path->data[i];
> > +
> > + if (start_here == NULL)
> > + return; /* this should never happen */
>
> If something has gone wrong, don't hide the error. Trying to carry on
> will likely produce some more obscure error later.

i'm pretty sure that the only way that can happen is a url like this:
http://foo.com which means no canonicalization is really possible.

(when i wrote the comment, i assumed that it wouldn't really happen,
and i'm still not sure if it can, since i don't know if
svn_path_is_url will match such a url.)

> > + }
> > + else
> > + start_here = path->data;
> > +
> > + /* condense extra /'s and /./'s */
> > + while ((tmp = strstr (start_here, "//")) != NULL)
>
> A (minor) performance point: the use of 'strstr(start_here,...' means
> that if there are multiple // occurences the start of the string is
> searched repeatedly.

true...

> > + strcpy (tmp, tmp + 1);
>
> Can paths contain null characters? Not on Linux, but what about other
> platforms? That's why svn_stringbuf_t has a len member. Better not to
> use strstr/strcpy.
>
> Another (minor) performance point: if there are multiple // the
> trailing part of the string is copied multiple times.
>
> > +
> > + while ((tmp = strstr (start_here, "/./")) != NULL)
> > + strcpy (tmp, tmp + 2);
>
> Similar comments to above.
>
> In addition "/./" is similar to "//". Really, although strstr() is
> convenient and the performance is unlikely to matter, a hand written
> loop may produce better code. Something like
>
> while not end of string
> if last element or string element != '/'
> copy element
> else if at least one more and string element + 1 == '/'
> skip element
> else if at least two more and string element + 1 == '.'
> and string element + 2 = '/'
> skip two elements
>
> Not that I've actually tried this algorithm to see if it handles all
> the cases, but you get the idea

yes, this should probably be rewritten to do something like that.

> > + /* ### should also remove ..'s where possible, but that code will be a bit
> > + icky, and i'm feeling lazy at the moment. we can't take the easy way out
> > + and just turn our paths into absolute paths, which would make this
> > + considerably easier, since doing that breaks all sorts of things for
> > + reasons i don't really understand. - rooneg */
> > +
> > + path->len = strlen(path->data);
>
> And using the above loop there would probably be some pointer to the
> last character copied so this strlen call would not be needed
> either.

true.

-garrett

-- 
garrett rooney                     Unix was not designed to stop you from 
rooneg@electricjellyfish.net       doing stupid things, because that would  
http://electricjellyfish.net/      stop you from doing clever things.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Sat Oct 21 14:36:57 2006

This message: [ Message body ]
Next message: Philip Martin: "Re: shellscript breakage"
Previous message: Philip Martin: "Re: path canonicalization problem (issue #559)"
In reply to: Philip Martin: "Re: path canonicalization problem (issue #559)"
Next in thread: Karl Fogel: "Re: path canonicalization problem (issue #559)"
Reply: Karl Fogel: "Re: path canonicalization problem (issue #559)"
Reply: Mark Benedetto King: "Re: path canonicalization problem (issue #559)"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]