Re: svn commit: rev 5521 [...]

From: Benjamin Pflugmann <benjamin-svn-dev_at_pflugmann.de>
Date: 2003-04-02 09:37:24 CEST

Hi!

Sorry about the late comment, but I didn't read my mails earlier.

On Tue 2003-04-01 at 21:58:08 -0600, bdenny@tigris.org wrote:
> Author: bdenny
> Date: Tue Apr 1 21:58:04 2003
> New Revision: 5521
>
[...]
> * path.c
> (get_longest_path_ancestor): New helper function, mostly identical with
> the old svn_path_get_longest_ancestor, but fixed to return NULL if there
> is no common ancestor, as per the svn_path_get_longest_ancestor docstring.

Wouldn't be fixing the docstring instead be more handy (see below)?
I.e. having an empty string meaning "no common ancestor". This would
allow to use the result directly in a function like apr_pstrcat().

There would be no ambiguity, because currently (at least after this
patch), an empty path is not possible: There is a check for it and
NULL returned instead.

> (svn_path_get_longest_ancestor): Accept URL targets; don't count
> 'protocol://' as a common ancestor.
> (svn_path_get_absolute): Accept URL targets; treat them as absolute paths.
[...]
> -char *
> -svn_path_get_longest_ancestor (const char *path1,
> - const char *path2,
> - apr_pool_t *pool)
> +/* Return the longest common ancestor of PATH1 and PATH2.
> + *
> + * This function handles everything except the URL-handling logic
> + * of svn_path_get_longest_ancestor, and assumes that PATH1 and
> + * PATH2 are *not* URLs.
> + *
> + * If the two paths do not share a common ancestor, NULL is returned.
> + *
> + * New strings are allocated in POOL.
> + */

Is this function used elsewhere? Or is there another reason that it
creates and returns a string instead of the length of the common part?
The reason I ask is detailed below.

> +static char *
> +get_longest_path_ancestor (const char *path1,
> + const char *path2,
> + apr_pool_t *pool)
> {
> char *common_path;
> apr_size_t path1_len, path2_len;
> @@ -500,10 +510,14 @@
> /* last_dirsep is now the offset of the last directory separator we
> crossed before reaching a non-matching byte. i is the offset of
> that non-matching byte. */
> - if (((i == path1_len) && (path2[i] == '/'))
> - || ((i == path2_len) && (path1[i] == '/'))
> - || ((i == path1_len) && (i == path2_len)))
> + if (i == 0)
> + return NULL;
> + else if (((i == path1_len) && (path2[i] == '/'))
> + || ((i == path2_len) && (path1[i] == '/'))
> + || ((i == path1_len) && (i == path2_len)))
> common_path = apr_pstrmemdup (pool, path1, i);
> + else if (last_dirsep == 0)
> + return NULL;
> else
> common_path = apr_pstrmemdup (pool, path1, last_dirsep);
>
> @@ -511,6 +525,55 @@
> }
>
>
> +char *
> +svn_path_get_longest_ancestor (const char *path1,
> + const char *path2,
> + apr_pool_t *pool)
> +{
> + if (svn_path_is_url (path1))
> + {
> + if (svn_path_is_url (path2))
> + {
> + char *ancestor, *path_ancestor, *prefix;
> + int i = 0;
> +
> + /* Find ':' */
> + for (i = 0; path1[i] != ':'; i++)
> + {
> + /* They're both URLs, so EOS can't come before ':' */
> + assert (path1[i] != '\0' && path2[i] != '\0');
> +
> + /* No shared protocol => no common prefix */
> + if (path1[i] != path2[i])
> + return NULL;
> + }
> +
> + i += 3; /* Advance past '://' */
> +
> + path_ancestor = get_longest_path_ancestor (path1 + i, path2 + i,
> + pool);
> +
> + /* No shared path => no common prefix */
> + if ((! path_ancestor) || (*path_ancestor == '\0'))
> + return NULL;
> +
> + else

There is an empty line too much above the else.

If svn_path_get_longest_ancestor() returned an empty string, the
apr_strcat would work unconditionally, i.e. the if-statement wouldn't
be needed.

Btw, with the current documented behaviour of
svn_path_get_longest_ancestor(), path_ancestor may not be an empty
string, so "|| (*path_ancestor == '\0')" is redundant

> + return apr_pstrcat (pool, apr_pstrndup (pool, path1, i),
> + path_ancestor, NULL);

Overall, the scheme is stripped away from the URL, then the common
part is looked for and then it is concatenated again. I had rather
expected that the substring of path1 of the common length is taken.
Just a nitpick, but could help clarity a bit. Something like

  apr_size_t common_part_lenth; // not sure, if that's the correct type
  [...]
  common_part_length = longest_path_ancestor_length (path1+i, path2+i, pool));
  return apr_pstrndup (pool, path1, common_part_length);

That's 2 lines for the functional part instead of 8 and in this case I
think shorter is clearer.

> + }
> + else
> + {
> + /* A URL and a non-URL => no common prefix */
> + return NULL;
> + }
> + }
> + else
> + {
> + return get_longest_path_ancestor (path1, path2, pool);
> + }

With the suggested change to svn_path_get_longest_ancestor() this
wouldn't work anymore. If it returns an empty string, either a check
for that is needed or the docstring must be changed. Hm. In the former
case we are back to the if-condition for the return value and have
only effectively moved down from above. On the other hand, most
consumers did not check for NULL values until before you changed it
(Please don't hurt me... I already apologized that I did not read the
mail earlier. ;-). It gets a little bit prettier with the suggested
longest_path_ancestor_length approach.

> +}

There is some inconsistency in the error handling. The outer structure
of the function is like this:

  if ( is_url(path1) )
    if ( is_url(path2) )
      return result_of_urls_handled;
    else
      return NULL;
  else
    return get_longest_path_ancestor()

If path1 is an URL, but path2 is not, NULL is returned.
If path2 is an URL, but path1 is not, get_longest_path_ancestor() is returned.

Although get_longest_path_ancestor() returns the correct result,
namely NULL, in most cases, when given an URL and a file name[1], the
code should be consistent or comment why it is inconsistent.

The nicest way I can currently think of is to save the result of the
svn_path_is_url calls and do something like this (I stayed with the
current approach instead of my suggestions to make it easier to
follow):

  path1_is_url = is_url(path1);
  path2_is_url = is_url(path2);

  // don't like to use binary xor(^) for logical expression, so use long story
  if ((!path1_is_url && path2_is_url) || (path1_is_url && !path2_is_url))
    return NULL;

  offset = 0; // has to be moved to start of function; we don't allow C99
  if (path1_is_url) // implies path2_is_url
    offset = ... // find end of scheme (://) and set offset accordingly
                 // i.e. effectively extract the path-like part

  path_ancestor = get_longest_path_ancestor (path1+offset, path2+offset, ...));
  if( ! path_ancestor )
    return NULL;
  else
    return apr_pstrncat [...]

If you want to go a step further, you could even consider that the
scheme part is a valid path, call get_longest_path_ancestor beforehand
and replace the offset check by a "return NULL" in case only the
schemes are equal. But that uses too much knowledge about URLs for my
taste. (Correct knowledge, mind you; at least with current RFCs).

HTH,

Benjamin.

[1] An counter-example, if I am not mistaken: a directly named "ftp:"
    and an URL "ftp://foobar". Yes, "ftp:" is a strange file name, but
    not as strange as "ftp://", and until now we only agreed that we
    shouldn't care about the latter.

application/pgp-signature attachment: stored

Received on Wed Apr 2 09:38:12 2003

This message: [ Message body ]
Next message: Michael Wood: "Re: Why are branches real directories?!?!?!"
Previous message: Michael Wood: "Re: browing java source via apache (maybe OT)"
Next in thread: Brian Denny: "Re: svn commit: rev 5521 [...]"
Reply: Brian Denny: "Re: svn commit: rev 5521 [...]"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]