[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] ISSUE #3193 Fix peg revision parsing for CLI repository root relative urls.

From: Troy Curtis Jr <troycurtisjr_at_gmail.com>
Date: Mon, 12 May 2008 21:51:06 -0500

On Mon, May 12, 2008 at 4:52 PM, Karl Fogel <kfogel_at_red-bean.com> wrote:
> "Troy Curtis Jr" <troycurtisjr_at_gmail.com> writes:
>> [[[
>> Fix peg revision support for repository root urls to address issue #3193.
>>
>> Move the duplicated peg revision parsing functionality from
>> svn_client_args_to_target_array() and svn_opt_args_to_target_array3() into
>> the common functions svn_opt__arg_canonicalize_url() and
>> svn_opt__arg_canonicalize_path().
>>
>> * subversion/libsvn_subr/opt.c
>> (split_arg_at_peg_revision): New function.
>> (svn_opt__arg_canonicalize_url,
>> svn_opt__arg_canonicalize_path): Allow the input argument to contain a peg
>> revision specifier, and preserve it in the output utilizing
>> split_arg_at_peg_revision().
>> (svn_opt_parse_path): Replace in-line peg revision parsing with the new
>> split_arg_at_peg_revision() function.
>> (svn_opt_args_to_target_array3): Remove the peg revision preserving logic,
>> relying instead on the new peg revision support in
>> svn_opt__arg_canonicalize_path() and svn_opt__arg_canonicalize_url().
>>
>> * subversion/include/private/svn_opt_private.h
>> (svn_opt__arg_canonicalize_url,
>> svn_opt__arg_canonicalize_path): Update the doc string to reflect the fact
>> that these functions can handle a peg revision in the input argument.
>>
>> * subversion/libsvn_client/cmdline.c
>> (svn_client_args_to_target_array): Remove the peg revision
preserving logic,
>> relying instead on the new peg support in svn_opt__arg_canonicalize_path()
>> and svn_opt__arg_canonicalize_url().
>>
>> * subversion/tests/cmdline/basic_tests.py
>> (basic_relative_url_with_peg_revisions): New test functions.
>> (test_list): Call the new test function.
>> ]]]
>
> There's only one test function, so:
>
> * subversion/tests/cmdline/basic_tests.py
> (basic_relative_url_with_peg_revisions): New test.
> (test_list): Run it.
>
> :-) On to the code...

Copy-n-paste is a blessing and a curse I tell ya.

>
>> --- subversion/libsvn_subr/opt.c (revision 31120)
>> +++ subversion/libsvn_subr/opt.c (working copy)
>> @@ -803,81 +803,123 @@
>> return SVN_NO_ERROR;
>> }
>>
>> +/* Extract the peg revision, if any, from UTF8_TARGET. Return the peg
>> + * revision in *PEG_REVISION and the true target portion in *TRUE_TARGET.
>> + * *PEG_REVISION will be set to NULL if no peg revision is found.
>> + *
>> + * UTF8_TARGET need not be canonical. *TRUE_TARGET will not be canonical
>> + * unless UTF8_TARGET is.
>> + *
>> + * *PEG_REVISION does still contain the '@' symbol as the first character.
>> + *
>> + * All allocations are done in POOL.
>> + */
>> +static svn_error_t *
>> +split_arg_at_peg_revision(const char **true_target,
>> + const char **peg_revision,
>> + const char *utf8_target,
>> + apr_pool_t *pool)
>
> Wow, I can't believe we didn't have something like this before :-).
>
>> +{
>> + const char *peg_start = NULL; /* pointer to the peg revision, if any */
>> + int j;
>>
>> + for (j = (strlen(utf8_target) - 1); j >= 0; --j)
>> + {
>> + /* If we hit a path separator, stop looking. This is OK
>> + only because our revision specifiers can't contain
>> + '/'. */
>> + if (utf8_target[j] == '/')
>> + break;
>> + if (utf8_target[j] == '@')
>> + {
>> + peg_start = utf8_target + j;
>> + break;
>> + }
>> + }
>> +
>> + if (peg_start)
>> + {
>> + *true_target = apr_pstrmemdup(pool,
>> + utf8_target,
>> + peg_start - utf8_target);
>> +
>> + *peg_revision = apr_pstrdup(pool, peg_start);
>> + }
>> + else
>> + {
>> + *true_target = utf8_target;
>> +
>> + *peg_revision = NULL;
>> + }
>> +
>> + return SVN_NO_ERROR;
>> +}
>
> Hmmm. The way we quote an '@'-sign in a path is by putting a blank
> peg rev at the end, for example:
>
> /path/to/a/file/with/a/real/@sign
>
> would be escaped like this:
>
> /path/to/a/file/with/a/real/@sign@
>
> In that case, *peg_revision would be a specially-allocated "@" in pool,
> but shouldn't it just be NULL instead?
>
>> svn_error_t *
>> svn_opt_parse_path(svn_opt_revision_t *rev,
>> const char **truepath,
>> const char *path /* UTF-8! */,
>> apr_pool_t *pool)
>> {
>> - int i;
>> + const char *peg_rev;
>>
>> - /* scanning from right to left, just to be friendly to any
>> - screwed-up filenames that might *actually* contain @-signs. :-) */
>> - for (i = (strlen(path) - 1); i >= 0; i--)
>> + SVN_ERR(split_arg_at_peg_revision(truepath, &peg_rev, path, pool));
>> +
>> + /* Parse the peg revision, if one was found */
>> + if (peg_rev)
>> {
>> - /* If we hit a path separator, stop looking. */
>> - /* This is OK only because our revision specifiers can't
contain '/'. */
>> - if (path[i] == '/')
>> - break;
>> + int ret;
>> + svn_opt_revision_t start_revision, end_revision;
>>
>> - if (path[i] == '@')
>> + end_revision.kind = svn_opt_revision_unspecified;
>> +
>> + if (peg_rev[1] == '\0') /* looking at empty peg revision */
>> {
>
> Oh, I see: you're having the caller be responsible for detecting an
> escaping '@'. That could work, but it would be a cleaner interface to
> just have split_arg_at_peg_revision() take care of this, which is what
> split_arg_at_peg_revision()'s doc string currently claims anyway.
>

Well the doc string does explicity state that the '@' character is left in the
peg revision parameter as the first character.

>
>> - int ret;
>> - svn_opt_revision_t start_revision, end_revision;
>> + ret = 0;
>> + start_revision.kind = svn_opt_revision_unspecified;
>> + }
>> + else /* looking at non-empty peg revision */
>> + {
>> + const char *rev_str = peg_rev + 1;
>>
>> - end_revision.kind = svn_opt_revision_unspecified;
>> -
>> - if (path[i + 1] == '\0') /* looking at empty peg revision */
>> + /* URLs get treated differently from wc paths. */
>> + if (svn_path_is_url(path))
>> {
>> - ret = 0;
>> - start_revision.kind = svn_opt_revision_unspecified;
>> - }
>> - else /* looking at non-empty peg revision */
>> - {
>> - const char *rev_str = path + i + 1;
>> -
>> - /* URLs get treated differently from wc paths. */
>> - if (svn_path_is_url(path))
>> + /* URLs are URI-encoded, so we look for dates with
>> + URI-encoded delimeters. */
>
> "delimiters" :-)

In fairness to this Texas boy, this is not a comment I wrote, it just looks
like an 'add' because of the way the diff worked out. (See a couple of lines
down)

>
>> + int rev_len = strlen(rev_str);
>> + if (rev_len > 6
>> + && rev_str[0] == '%'
>> + && rev_str[1] == '7'
>> + && (rev_str[2] == 'B'
>> + || rev_str[2] == 'b')
>> + && rev_str[rev_len-3] == '%'
>> + && rev_str[rev_len-2] == '7'
>> + && (rev_str[rev_len-1] == 'D'
>> + || rev_str[rev_len-1] == 'd'))
>> {
>> - /* URLs are URI-encoded, so we look for dates with
>> - URI-encoded delimeters. */
>> - int rev_len = strlen(rev_str);
>> - if (rev_len > 6
>> - && rev_str[0] == '%'
>> - && rev_str[1] == '7'
>> - && (rev_str[2] == 'B'
>> - || rev_str[2] == 'b')
>> - && rev_str[rev_len-3] == '%'
>> - && rev_str[rev_len-2] == '7'
>> - && (rev_str[rev_len-1] == 'D'
>> - || rev_str[rev_len-1] == 'd'))
>> - {
>> - rev_str = svn_path_uri_decode(rev_str, pool);
>> - }
>> + rev_str = svn_path_uri_decode(rev_str, pool);
>> }
>> - ret = svn_opt_parse_revision(&start_revision,
>> - &end_revision,
>> - rev_str, pool);
>> }
>> + ret = svn_opt_parse_revision(&start_revision,
>> + &end_revision,
>> + rev_str, pool);
>> + }
>>
>> - if (ret || end_revision.kind != svn_opt_revision_unspecified)
>> - return svn_error_createf(SVN_ERR_CL_ARG_PARSING_ERROR, NULL,
>> - _("Syntax error parsing
revision '%s'"),
>> - path + i + 1);
>> + if (ret || end_revision.kind != svn_opt_revision_unspecified)
>> + return svn_error_createf(SVN_ERR_CL_ARG_PARSING_ERROR, NULL,
>> + _("Syntax error parsing revision '%s'"),
>> + peg_rev + 1);
>>
>> - *truepath = apr_pstrmemdup(pool, path, i);
>> - rev->kind = start_revision.kind;
>> - rev->value = start_revision.value;
>> -
>> - return SVN_NO_ERROR;
>> - }
>> + rev->kind = start_revision.kind;
>> + rev->value = start_revision.value;
>> }
>> + else
>> + {
>> + /* Didn't find an peg revision. */
>
> "a"
>
>> @@ -987,13 +997,19 @@
>> }
>> else /* not a url, so treat as a path */
>> {
>> - const char *base_name;
>> + char *base_name;
>> + char *peg_rev;
>>
>> SVN_ERR(svn_opt__arg_canonicalize_path(&target,
utf8_target, pool));
>>
>> /* If the target has the same name as a Subversion
>> working copy administrative dir, skip it. */
>> base_name = svn_path_basename(target, pool);
>> +
>> + /* "Delete" the peg revision */
>> + if (peg_rev = strrchr(base_name, '@'))
>> + *peg_rev = '\0';
>
> This assignment-in-condition construction got me a warning:
>
> subversion/libsvn_subr/opt.c: In function svn_opt_args_to_target_array3:
> subversion/libsvn_subr/opt.c:1010: warning: suggest parentheses
> around assignment used as truth value

I have trouble seeing any warnings in the extremely verbose Subversion output,
if it could build more like, say, the Linux kernel it'd be much easier :).
Sure I could use 'make -s', but then it looks like it is hung! I like the
warm fuzzy of status text scrolling by, but not TOO much.

>
> I guess we've asked GCC to play police here, so we should move the
> assignment to a separate line.
>
> But more importantly, I think there's an opportunity to remove some code
> duplication here.
>
> We have split_arg_at_peg_revision(), which turns a path-maybe-with-peg
> into a path and a peg (NULL if no peg). Its only callers are
>
> svn_opt_parse_path()
> svn_opt__arg_canonicalize_url()
> svn_opt__arg_canonicalize_path()
>
> The first of those returns the peg revision separately (as a struct);
> the latter two instead reassemble target+peg_rev, just in canonical
> form. But now, every caller of svn_opt__arg_canonicalize_path() further
> converts the target to a base_name and then does
>
> strrchr(base_name, '@')
>
> to find and hide a peg rev that we've already found once before.
>
> Maybe it would be better to just have svn_opt__arg_canonicalize_path()
> take a new *peg_rev parameter, which (if the parameter itself is
> non-NULL) returns any peg rev found, as well as returning the
> canonicalized true target. Then we'd have the logic for parsing peg
> revs centralized in one place (including handling escaping), and some
> duplicate code could go away.
>
> Was this discussed already? I didn't follow the thread in detail.

Only between the voices in my head. I thought about doing something like this
but decided against it for two reasons:
 - Returning a peg revision seperately does not really do what the function
   name implies. The function 'canonicalizes' the given value, it just happens
   to also be able to cope with peg revisions. Returning a peg revision would
   be like a mix of a 'canonicalize' function and my
   split_arg_at_peg_revision() function.
 - If I did go that route I knew that I would need to implement the 'if peg
   rev argument is NULL don't return one there' and thought the devs
did not like
   that kind of thing in one of my previous patches.

I could create a helper function that took a target and
determined whether it is a adm dir, but could recognoize peg revisions. Not
sure what I would call such a function at the moment, but it is an idea.

When I initially coded this I did not like the duplication that much either,
but decided to keep it because I really felt like the
svn_client_args_to_target_array() and svn_opt_args_to_target_array3()
already had so much in common that this fit right in. I couldn't create a
private helper function like I did with split_arg_at_peg_revision() because
they are in two different libraries.

In the end I could not find an alternative solution that I liked better and
figured the patch already reduced duplicate code by a good measure that
leaving this logic in place would be fine.

>
>> @@ -1114,14 +1125,30 @@
>> return SVN_NO_ERROR;
>> }
>>
>> +
>> svn_error_t *
>> svn_opt__arg_canonicalize_url(const char **url_out, const char *url_in,
>> apr_pool_t *pool)
>> {
>> const char *target;
>> + const char *peg_rev;
>>
>> + /*
>> + * This is needed so that url_in can be properly canonicalized,
>> + * otherwise the canonicalization does not treat a "._at_BASE" as a "."
>> + * with a BASE peg revision, and it is not canonicalized to "@BASE".
>> + * If any peg revision exists, it is appended to the final
>> + * canonicalized path or URL. Do not use svn_opt_parse_path()
>> + * because the resulting peg revision is a structure that would have
>> + * to be converted back into a string. Converting from a string date
>> + * to the apr_time_t field in the svn_opt_revision_value_t and back to
>> + * a string would not necessarily preserve the exact bytes of the
>> + * input date, so its easier just to keep it in string form.
>> + */
>> + SVN_ERR(split_arg_at_peg_revision(&target, &peg_rev, url_in, pool));
>> +
>> /* Convert to URI. */
>> - target = svn_path_uri_from_iri(url_in, pool);
>> + target = svn_path_uri_from_iri(target, pool);
>> /* Auto-escape some ASCII characters. */
>> target = svn_path_uri_autoescape(target, pool);
>>
>> @@ -1140,6 +1167,11 @@
>> /* strip any trailing '/' and collapse other redundant elements */
>> target = svn_path_canonicalize(target, pool);
>>
>> + /* Append the peg revision back to the canonicalized target if
>> + there was a peg revision. */
>> + if (peg_rev)
>> + target = apr_pstrcat(pool, target, peg_rev, NULL);
>
> Yeah, the fact that the above huge comment and code...
>
>> *url_out = target;
>> return SVN_NO_ERROR;
>> }
>> @@ -1151,9 +1183,24 @@
>> const char *apr_target;
>> char *truenamed_target; /* APR-encoded */
>> apr_status_t apr_err;
>> + const char *peg_rev;
>>
>> + /*
>> + * This is needed so that url_in can be properly canonicalized,
>> + * otherwise the canonicalization does not treat a "._at_BASE" as a "."
>> + * with a BASE peg revision, and it is not canonicalized to "@BASE".
>> + * If any peg revision exists, it is appended to the final
>> + * canonicalized path or URL. Do not use svn_opt_parse_path()
>> + * because the resulting peg revision is a structure that would have
>> + * to be converted back into a string. Converting from a string date
>> + * to the apr_time_t field in the svn_opt_revision_value_t and back to
>> + * a string would not necessarily preserve the exact bytes of the
>> + * input date, so its easier just to keep it in string form.
>> + */
>> + SVN_ERR(split_arg_at_peg_revision(&apr_target, &peg_rev, path_in, pool));
>> +
>> /* canonicalize case, and change all separators to '/'. */
>> - SVN_ERR(svn_path_cstring_from_utf8(&apr_target, path_in, pool));
>> + SVN_ERR(svn_path_cstring_from_utf8(&apr_target, apr_target, pool));
>> apr_err = apr_filepath_merge(&truenamed_target, "", apr_target,
>> APR_FILEPATH_TRUENAME, pool);
>>
>> @@ -1174,6 +1221,11 @@
>> SVN_ERR(svn_path_cstring_to_utf8(path_out, apr_target, pool));
>> *path_out = svn_path_canonicalize(*path_out, pool);
>>
>> + /* Append the peg revision back to the canonicalized target if
>> + there was a peg revision. */
>> + if (peg_rev)
>> + *path_out = apr_pstrcat(pool, *path_out, peg_rev, NULL);
>> +
>> return SVN_NO_ERROR;
>> }
>
> ...are duplicated here signifies an opportunity, I think.
>

I really debated on where to move this comment. It was initially in the
args_to_target_array* functions. It commented the code that I consolidated
into split_arg_at_peg_revision() but I didn't think it really fit with that
function. It seemed to go in the usage part. It seemed to comment on why I
am using split_arg_at_peg_revision() instead of svn_opt_parse_path(), and not
so much why split_arg_at_peg_revision() exists.

Well the apr_pstrcat() duplication would simply be moved into the
*args_to_target_array() functions. I suppose it would get rid of the
"strrchr(base_name, '@')" code, but I don't know that it is worth it. I know
that trying to leave the peg revision seperated out and sticking into some
kind of array of target struct that svn_client_args_to_target_array() returns
is DEFINITELY not something I wanted to mess with. The assumption that an
array of targets as strings, possibly with peg revs, runs deep. (I chased it
down during my initial repository root relative url patch).

So in the end I could remove
+ if (peg_rev = strrchr(base_name, '@'))
+ *peg_rev = '\0';

and instead put in
+ if (peg_rev)
+ *path_out = apr_pstrcat(pool, *path_out, peg_rev, NULL);

instead. In both cases the caller has to be "peg aware", though not "peg
syntax aware" in the second case. This also at the cost of overloading the
meaning of svn_opt__arg_canonicalize_{url,path}(), or changing their names
entirely.

>
>> --- subversion/libsvn_client/cmdline.c (revision 31120)
>> +++ subversion/libsvn_client/cmdline.c (working copy)
>> @@ -247,7 +216,8 @@
>> }
>> else /* not a url, so treat as a path */
>> {
>> - const char *base_name;
>> + char *base_name;
>> + char *peg_rev;
>>
>> SVN_ERR(svn_opt__arg_canonicalize_path(&target,
>> utf8_target, pool));
>> @@ -256,6 +226,12 @@
>> working copy administrative dir, skip it. */
>> base_name = svn_path_basename(target, pool);
>>
>> + /* "Delete" the peg revision because
svn_wc_is_adm_dir() cannot
>> + * handle it.
>> + */
>> + if (peg_rev = strrchr(base_name, '@'))
>> + *peg_rev = '\0';
>> +
>> if (svn_wc_is_adm_dir(base_name, pool))
>> {
>> err =
svn_error_createf(SVN_ERR_RESERVED_FILENAME_SPECIFIED,
>
> And this duplicates code we saw earlier in svn_opt_args_to_target_array3().
>
> Thoughts?
>
> -Karl
>

Troy

-- 
Beware of spyware. If you can, use the Firefox browser. - USA Today
Download now at http://getfirefox.com
Registered Linux User #354814 ( http://counter.li.org/)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-05-13 04:51:21 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.