[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] ISSUE #3193 Fix peg revision parsing for CLI repository root relative urls.

From: Karl Fogel <kfogel_at_red-bean.com>
Date: Mon, 12 May 2008 17:52:53 -0400

"Troy Curtis Jr" <troycurtisjr_at_gmail.com> writes:
> [[[
> Fix peg revision support for repository root urls to address issue #3193.
>
> Move the duplicated peg revision parsing functionality from
> svn_client_args_to_target_array() and svn_opt_args_to_target_array3() into
> the common functions svn_opt__arg_canonicalize_url() and
> svn_opt__arg_canonicalize_path().
>
> * subversion/libsvn_subr/opt.c
> (split_arg_at_peg_revision): New function.
> (svn_opt__arg_canonicalize_url,
> svn_opt__arg_canonicalize_path): Allow the input argument to contain a peg
> revision specifier, and preserve it in the output utilizing
> split_arg_at_peg_revision().
> (svn_opt_parse_path): Replace in-line peg revision parsing with the new
> split_arg_at_peg_revision() function.
> (svn_opt_args_to_target_array3): Remove the peg revision preserving logic,
> relying instead on the new peg revision support in
> svn_opt__arg_canonicalize_path() and svn_opt__arg_canonicalize_url().
>
> * subversion/include/private/svn_opt_private.h
> (svn_opt__arg_canonicalize_url,
> svn_opt__arg_canonicalize_path): Update the doc string to reflect the fact
> that these functions can handle a peg revision in the input argument.
>
> * subversion/libsvn_client/cmdline.c
> (svn_client_args_to_target_array): Remove the peg revision preserving logic,
> relying instead on the new peg support in svn_opt__arg_canonicalize_path()
> and svn_opt__arg_canonicalize_url().
>
> * subversion/tests/cmdline/basic_tests.py
> (basic_relative_url_with_peg_revisions): New test functions.
> (test_list): Call the new test function.
> ]]]

There's only one test function, so:

   * subversion/tests/cmdline/basic_tests.py
     (basic_relative_url_with_peg_revisions): New test.
     (test_list): Run it.

:-) On to the code...

> --- subversion/libsvn_subr/opt.c (revision 31120)
> +++ subversion/libsvn_subr/opt.c (working copy)
> @@ -803,81 +803,123 @@
> return SVN_NO_ERROR;
> }
>
> +/* Extract the peg revision, if any, from UTF8_TARGET. Return the peg
> + * revision in *PEG_REVISION and the true target portion in *TRUE_TARGET.
> + * *PEG_REVISION will be set to NULL if no peg revision is found.
> + *
> + * UTF8_TARGET need not be canonical. *TRUE_TARGET will not be canonical
> + * unless UTF8_TARGET is.
> + *
> + * *PEG_REVISION does still contain the '@' symbol as the first character.
> + *
> + * All allocations are done in POOL.
> + */
> +static svn_error_t *
> +split_arg_at_peg_revision(const char **true_target,
> + const char **peg_revision,
> + const char *utf8_target,
> + apr_pool_t *pool)

Wow, I can't believe we didn't have something like this before :-).

> +{
> + const char *peg_start = NULL; /* pointer to the peg revision, if any */
> + int j;
>
> + for (j = (strlen(utf8_target) - 1); j >= 0; --j)
> + {
> + /* If we hit a path separator, stop looking. This is OK
> + only because our revision specifiers can't contain
> + '/'. */
> + if (utf8_target[j] == '/')
> + break;
> + if (utf8_target[j] == '@')
> + {
> + peg_start = utf8_target + j;
> + break;
> + }
> + }
> +
> + if (peg_start)
> + {
> + *true_target = apr_pstrmemdup(pool,
> + utf8_target,
> + peg_start - utf8_target);
> +
> + *peg_revision = apr_pstrdup(pool, peg_start);
> + }
> + else
> + {
> + *true_target = utf8_target;
> +
> + *peg_revision = NULL;
> + }
> +
> + return SVN_NO_ERROR;
> +}

Hmmm. The way we quote an '@'-sign in a path is by putting a blank
peg rev at the end, for example:

   /path/to/a/file/with/a/real/@sign

would be escaped like this:

   /path/to/a/file/with/a/real/@sign@

In that case, *peg_revision would be a specially-allocated "@" in pool,
but shouldn't it just be NULL instead?

> svn_error_t *
> svn_opt_parse_path(svn_opt_revision_t *rev,
> const char **truepath,
> const char *path /* UTF-8! */,
> apr_pool_t *pool)
> {
> - int i;
> + const char *peg_rev;
>
> - /* scanning from right to left, just to be friendly to any
> - screwed-up filenames that might *actually* contain @-signs. :-) */
> - for (i = (strlen(path) - 1); i >= 0; i--)
> + SVN_ERR(split_arg_at_peg_revision(truepath, &peg_rev, path, pool));
> +
> + /* Parse the peg revision, if one was found */
> + if (peg_rev)
> {
> - /* If we hit a path separator, stop looking. */
> - /* This is OK only because our revision specifiers can't contain '/'. */
> - if (path[i] == '/')
> - break;
> + int ret;
> + svn_opt_revision_t start_revision, end_revision;
>
> - if (path[i] == '@')
> + end_revision.kind = svn_opt_revision_unspecified;
> +
> + if (peg_rev[1] == '\0') /* looking at empty peg revision */
> {

Oh, I see: you're having the caller be responsible for detecting an
escaping '@'. That could work, but it would be a cleaner interface to
just have split_arg_at_peg_revision() take care of this, which is what
split_arg_at_peg_revision()'s doc string currently claims anyway.

> - int ret;
> - svn_opt_revision_t start_revision, end_revision;
> + ret = 0;
> + start_revision.kind = svn_opt_revision_unspecified;
> + }
> + else /* looking at non-empty peg revision */
> + {
> + const char *rev_str = peg_rev + 1;
>
> - end_revision.kind = svn_opt_revision_unspecified;
> -
> - if (path[i + 1] == '\0') /* looking at empty peg revision */
> + /* URLs get treated differently from wc paths. */
> + if (svn_path_is_url(path))
> {
> - ret = 0;
> - start_revision.kind = svn_opt_revision_unspecified;
> - }
> - else /* looking at non-empty peg revision */
> - {
> - const char *rev_str = path + i + 1;
> -
> - /* URLs get treated differently from wc paths. */
> - if (svn_path_is_url(path))
> + /* URLs are URI-encoded, so we look for dates with
> + URI-encoded delimeters. */

"delimiters" :-)

> + int rev_len = strlen(rev_str);
> + if (rev_len > 6
> + && rev_str[0] == '%'
> + && rev_str[1] == '7'
> + && (rev_str[2] == 'B'
> + || rev_str[2] == 'b')
> + && rev_str[rev_len-3] == '%'
> + && rev_str[rev_len-2] == '7'
> + && (rev_str[rev_len-1] == 'D'
> + || rev_str[rev_len-1] == 'd'))
> {
> - /* URLs are URI-encoded, so we look for dates with
> - URI-encoded delimeters. */
> - int rev_len = strlen(rev_str);
> - if (rev_len > 6
> - && rev_str[0] == '%'
> - && rev_str[1] == '7'
> - && (rev_str[2] == 'B'
> - || rev_str[2] == 'b')
> - && rev_str[rev_len-3] == '%'
> - && rev_str[rev_len-2] == '7'
> - && (rev_str[rev_len-1] == 'D'
> - || rev_str[rev_len-1] == 'd'))
> - {
> - rev_str = svn_path_uri_decode(rev_str, pool);
> - }
> + rev_str = svn_path_uri_decode(rev_str, pool);
> }
> - ret = svn_opt_parse_revision(&start_revision,
> - &end_revision,
> - rev_str, pool);
> }
> + ret = svn_opt_parse_revision(&start_revision,
> + &end_revision,
> + rev_str, pool);
> + }
>
> - if (ret || end_revision.kind != svn_opt_revision_unspecified)
> - return svn_error_createf(SVN_ERR_CL_ARG_PARSING_ERROR, NULL,
> - _("Syntax error parsing revision '%s'"),
> - path + i + 1);
> + if (ret || end_revision.kind != svn_opt_revision_unspecified)
> + return svn_error_createf(SVN_ERR_CL_ARG_PARSING_ERROR, NULL,
> + _("Syntax error parsing revision '%s'"),
> + peg_rev + 1);
>
> - *truepath = apr_pstrmemdup(pool, path, i);
> - rev->kind = start_revision.kind;
> - rev->value = start_revision.value;
> -
> - return SVN_NO_ERROR;
> - }
> + rev->kind = start_revision.kind;
> + rev->value = start_revision.value;
> }
> + else
> + {
> + /* Didn't find an peg revision. */

"a"

> @@ -987,13 +997,19 @@
> }
> else /* not a url, so treat as a path */
> {
> - const char *base_name;
> + char *base_name;
> + char *peg_rev;
>
> SVN_ERR(svn_opt__arg_canonicalize_path(&target, utf8_target, pool));
>
> /* If the target has the same name as a Subversion
> working copy administrative dir, skip it. */
> base_name = svn_path_basename(target, pool);
> +
> + /* "Delete" the peg revision */
> + if (peg_rev = strrchr(base_name, '@'))
> + *peg_rev = '\0';

This assignment-in-condition construction got me a warning:

   subversion/libsvn_subr/opt.c: In function ‘svn_opt_args_to_target_array3’:
   subversion/libsvn_subr/opt.c:1010: warning: suggest parentheses
     around assignment used as truth value

I guess we've asked GCC to play police here, so we should move the
assignment to a separate line.

But more importantly, I think there's an opportunity to remove some code
duplication here.

We have split_arg_at_peg_revision(), which turns a path-maybe-with-peg
into a path and a peg (NULL if no peg). Its only callers are

   svn_opt_parse_path()
   svn_opt__arg_canonicalize_url()
   svn_opt__arg_canonicalize_path()

The first of those returns the peg revision separately (as a struct);
the latter two instead reassemble target+peg_rev, just in canonical
form. But now, every caller of svn_opt__arg_canonicalize_path() further
converts the target to a base_name and then does

   strrchr(base_name, '@')

to find and hide a peg rev that we've already found once before.

Maybe it would be better to just have svn_opt__arg_canonicalize_path()
take a new *peg_rev parameter, which (if the parameter itself is
non-NULL) returns any peg rev found, as well as returning the
canonicalized true target. Then we'd have the logic for parsing peg
revs centralized in one place (including handling escaping), and some
duplicate code could go away.

Was this discussed already? I didn't follow the thread in detail.

> @@ -1114,14 +1125,30 @@
> return SVN_NO_ERROR;
> }
>
> +
> svn_error_t *
> svn_opt__arg_canonicalize_url(const char **url_out, const char *url_in,
> apr_pool_t *pool)
> {
> const char *target;
> + const char *peg_rev;
>
> + /*
> + * This is needed so that url_in can be properly canonicalized,
> + * otherwise the canonicalization does not treat a "._at_BASE" as a "."
> + * with a BASE peg revision, and it is not canonicalized to "@BASE".
> + * If any peg revision exists, it is appended to the final
> + * canonicalized path or URL. Do not use svn_opt_parse_path()
> + * because the resulting peg revision is a structure that would have
> + * to be converted back into a string. Converting from a string date
> + * to the apr_time_t field in the svn_opt_revision_value_t and back to
> + * a string would not necessarily preserve the exact bytes of the
> + * input date, so its easier just to keep it in string form.
> + */
> + SVN_ERR(split_arg_at_peg_revision(&target, &peg_rev, url_in, pool));
> +
> /* Convert to URI. */
> - target = svn_path_uri_from_iri(url_in, pool);
> + target = svn_path_uri_from_iri(target, pool);
> /* Auto-escape some ASCII characters. */
> target = svn_path_uri_autoescape(target, pool);
>
> @@ -1140,6 +1167,11 @@
> /* strip any trailing '/' and collapse other redundant elements */
> target = svn_path_canonicalize(target, pool);
>
> + /* Append the peg revision back to the canonicalized target if
> + there was a peg revision. */
> + if (peg_rev)
> + target = apr_pstrcat(pool, target, peg_rev, NULL);

Yeah, the fact that the above huge comment and code...

> *url_out = target;
> return SVN_NO_ERROR;
> }
> @@ -1151,9 +1183,24 @@
> const char *apr_target;
> char *truenamed_target; /* APR-encoded */
> apr_status_t apr_err;
> + const char *peg_rev;
>
> + /*
> + * This is needed so that url_in can be properly canonicalized,
> + * otherwise the canonicalization does not treat a "._at_BASE" as a "."
> + * with a BASE peg revision, and it is not canonicalized to "@BASE".
> + * If any peg revision exists, it is appended to the final
> + * canonicalized path or URL. Do not use svn_opt_parse_path()
> + * because the resulting peg revision is a structure that would have
> + * to be converted back into a string. Converting from a string date
> + * to the apr_time_t field in the svn_opt_revision_value_t and back to
> + * a string would not necessarily preserve the exact bytes of the
> + * input date, so its easier just to keep it in string form.
> + */
> + SVN_ERR(split_arg_at_peg_revision(&apr_target, &peg_rev, path_in, pool));
> +
> /* canonicalize case, and change all separators to '/'. */
> - SVN_ERR(svn_path_cstring_from_utf8(&apr_target, path_in, pool));
> + SVN_ERR(svn_path_cstring_from_utf8(&apr_target, apr_target, pool));
> apr_err = apr_filepath_merge(&truenamed_target, "", apr_target,
> APR_FILEPATH_TRUENAME, pool);
>
> @@ -1174,6 +1221,11 @@
> SVN_ERR(svn_path_cstring_to_utf8(path_out, apr_target, pool));
> *path_out = svn_path_canonicalize(*path_out, pool);
>
> + /* Append the peg revision back to the canonicalized target if
> + there was a peg revision. */
> + if (peg_rev)
> + *path_out = apr_pstrcat(pool, *path_out, peg_rev, NULL);
> +
> return SVN_NO_ERROR;
> }

...are duplicated here signifies an opportunity, I think.

> --- subversion/libsvn_client/cmdline.c (revision 31120)
> +++ subversion/libsvn_client/cmdline.c (working copy)
> @@ -247,7 +216,8 @@
> }
> else /* not a url, so treat as a path */
> {
> - const char *base_name;
> + char *base_name;
> + char *peg_rev;
>
> SVN_ERR(svn_opt__arg_canonicalize_path(&target,
> utf8_target, pool));
> @@ -256,6 +226,12 @@
> working copy administrative dir, skip it. */
> base_name = svn_path_basename(target, pool);
>
> + /* "Delete" the peg revision because svn_wc_is_adm_dir() cannot
> + * handle it.
> + */
> + if (peg_rev = strrchr(base_name, '@'))
> + *peg_rev = '\0';
> +
> if (svn_wc_is_adm_dir(base_name, pool))
> {
> err = svn_error_createf(SVN_ERR_RESERVED_FILENAME_SPECIFIED,

And this duplicates code we saw earlier in svn_opt_args_to_target_array3().

Thoughts?

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-05-12 23:53:11 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.