[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: path canonicalization problem (issue #559)

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2002-01-21 23:44:01 CET

Garrett Rooney <rooneg@electricjellyfish.net> writes:

> --- subversion/libsvn_subr/path.c
> +++ subversion/libsvn_subr/path.c Mon Jan 21 14:27:21 2002
> @@ -85,11 +85,40 @@
> void
> svn_path_canonicalize (svn_stringbuf_t *path)
> {
> - /* At some point this could eliminate redundant components.
> - For now, it just makes sure there is no trailing slash. */
> + char *start_here = NULL;
> + char *tmp, *tmp1;
>
> - /* kff todo: maybe should be implemented with a new routine in
> - libsvn_string. */
> + /* if this is a url, we want to start condensing at the third slash,
> + otherwise we start at the beginning of the path */
> + if (svn_path_is_url ((const svn_string_t *)path))

This cast relies on the members of svn_string_t being ordered the same
as svn_stringbuf_t. It's true at present, but is it guaranteed?

> + {
> + int i, num_slashes = 0;
> +
> + for (i = 0; i < path->len; ++i)
> + if (path->data[i] == '/')

Should SVN_PATH_LOCAL_SEPARATOR or SVN_PATH_SEPARATOR be used
here?

> + if (++num_slashes == 3)
> + start_here = &path->data[i];
> +
> + if (start_here == NULL)
> + return; /* this should never happen */

If something has gone wrong, don't hide the error. Trying to carry on
will likely produce some more obscure error later.

> + }
> + else
> + start_here = path->data;
> +
> + /* condense extra /'s and /./'s */
> + while ((tmp = strstr (start_here, "//")) != NULL)

A (minor) performance point: the use of 'strstr(start_here,...' means
that if there are multiple // occurences the start of the string is
searched repeatedly.

> + strcpy (tmp, tmp + 1);

Can paths contain null characters? Not on Linux, but what about other
platforms? That's why svn_stringbuf_t has a len member. Better not to
use strstr/strcpy.

Another (minor) performance point: if there are multiple // the
trailing part of the string is copied multiple times.

> +
> + while ((tmp = strstr (start_here, "/./")) != NULL)
> + strcpy (tmp, tmp + 2);

Similar comments to above.

In addition "/./" is similar to "//". Really, although strstr() is
convenient and the performance is unlikely to matter, a hand written
loop may produce better code. Something like

       while not end of string
         if last element or string element != '/'
           copy element
         else if at least one more and string element + 1 == '/'
           skip element
         else if at least two more and string element + 1 == '.'
                                   and string element + 2 = '/'
           skip two elements

Not that I've actually tried this algorithm to see if it handles all
the cases, but you get the idea

> +
> + /* ### should also remove ..'s where possible, but that code will be a bit
> + icky, and i'm feeling lazy at the moment. we can't take the easy way out
> + and just turn our paths into absolute paths, which would make this
> + considerably easier, since doing that breaks all sorts of things for
> + reasons i don't really understand. - rooneg */
> +
> + path->len = strlen(path->data);

And using the above loop there would probably be some pointer to the
last character copied so this strlen call would not be needed
either.

-- 
Philip
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:57 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.