[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Repository root relative url support in svn CLI -- REDUX

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Wed, 27 Feb 2008 16:51:26 +0000

Troy Curtis Jr wrote:
>> Troy Curtis Jr wrote:
>> >>From my original patch (http://svn.haxx.se/dev/archive-2007-11/0708.shtml)
>> > through discussions about a better implementation
>> > (http://svn.haxx.se/dev/archive-2007-11/1113.shtml and
>> > http://svn.haxx.se/dev/archive-2007-12/0047.shtml) I have returned with a
>> > better relative url implementation for the svn client, at least so I believe.
>> >
>> > In this patch I implement the following logic for parsing the svn CLI
>> > client args:
>> > - If there is a relative url in the target list
>> > - Then use the repository root url from the other targets to resolve the
>> > relative url.
>>
>> ... and error if there are multiple different roots.
>>
>> We need to be clear whether "args" and "targets" means all targets including
>> those specified with "--targets", or just direct command-line arguments. Which
>> should it be, and which have you implemented? (I haven't thought enough to be
>> sure which it should be.)
>
> I included targets in my implementation. There are a couple of reasons.
>
> First, the --targets argument really seems like an
> extension of targets you can specify on the command-line, and so should be
> treated the same. My personal use of --targets has mostly consisted of echoing
> a list of targets into a file as I cruise through a development tree as a
> sort-of 'save-for-later' list to be used in a follow up svn command.

OK, that sounds sane and good. I wasn't sure from my initial look at the code
whether that was actually happening, but I'm not saying it's not.

> Second, because I chose to create a wrapper script for the
> svn_opt_args_to_target_array3(), this function (and thus my function) takes
> the 'known_targets' argument and those targets are rolled into final 'targets'
> output. I thought it definitely made sense for the 'known_targets' arg to be
> parsed just like command-line args.

>> Do we need to allow backtracking with ".." before canonicalisation, in either of:
>>
>> (a) inside a repository. Examples:
>> svn log ^/trunk/dir1/../dir2/
>> MYBRANCH=^/branches/br1 svn log $MYBRANCH/../br2
>>
>> (b) in the URL-of-repository, to get from one repository to another, e.g.
>> svn log ^/../there's-another-repos-here/their-trunk
>> svn log ^/trunk/../../repos2/their-trunk
>> ?
>>
>> I come back to this question below. I can't remember whether we decided a long
>> time ago what should be allowed. Right now, off the top of my head, I think (a)
>> should be allowed pre-canonicalisation, if and only if that is consistent with
>> what we do for other URLs and paths. I don't think (b) should be allowed.
>>
>> What do you think is best for the initial implementation of this feature?
>
> Personally only (b) really seems that useful and was what I was leaving
> it in for. I think I might understand your 'ick' reaction to mixing
> paths in a repo
> and paths to other repos, but ultimately it's just a short cut. So just
> like '../' in a unix filesystem might traverse all kinds of different
> filesystems, I think it's valid to allow the user to not have to type the long
> host and server part of the url for their *other* repository.
>
> In fact this use case did score an explicit mention in
> resolve_relative_external_url():
>
> [quote]
> /* Handle URLs relative to the current directory or to the
> repository root. The backpaths may only remove path elements,
> not the hostname. This allows an external to refer to another
> repository in the same server relative to the location of this
> repository, say using SVNParentPath. */
> [/quote]
>
> That said I'm not overly attached to it so I don't *really* mind removing it.
> Is anyone else reading this long discussion? If so, what are your thoughts?

I hadn't noticed that mention in relative-externals support. But there is a
difference: externals have always been about grabbing a sub-tree from
"somewhere else" that might be from a different repository. (No comment on
whether using a relative path to a different repository is sane in that case.)
In the case of repos-relative support on the command line, however, we have
already determined that we can't parse the repos-relative syntax in a command
that has other targets specified in multiple repositories, and I think it would
be bizarre to be able to generate such a command through the syntax.

So I think the backtracking beyond the repository root should be removed. It's
much better to start with behaviour that is simple and useful and safe, and to
be able to add complexities later, than to start with lots of possibilities
that "might" be useful and later find that some of them cause problems.

Get rid of case (a) too unless it comes for free in our existing canonisation
functions.

>> > - If there are no other targets, use the root url of the current directory.
>> > - If the current directory is not a working copy, issue an error.
>> >
>> > This seems to get most of the use cases right. It does leave multi-repository
>> > urls out, but they are much less frequent and we couldn't come up with a
>> > simple (to understand) method of resolving them.
>>
>> Tha above logic (command-line semantics), as well as the syntax, ought to be
>> documented somewhere in the code (and in the book, eventually). I've pointed
>> out below a place where it could go in the code.
>>
>>
>>
>> > This patch contains what is likely a ridiculous amount of svn tests, basically
>> > one for every command of interest to relative urls. It might be too much
>> > because ultimately they all run the same function. However, I had them from
>> > my previous attempt (in which each svn subcommand had it's own implementation)
>> > and I figured they could be useful in making sure checks and translations
>> > don't get made before the call to svn_cl__args_to_target_array() that break
>> > relative url support.
>>
>> There is a rather large amount of test code for what it achieves. Test code has
>> to be maintained too. Without studying and comparing it all, it looks like
>> duplicating several large test cases just to create a variant that uses a
>> relative URL. (Is that the case?) Rather than that, perhaps you can either
>> factor out the common code in those cases, or just don't add most of these
>> tests because we know there's no special need to test relative URLs for lots of
>> different commands. Just leave a couple of the small ones, or a set of tests
>> that specifically check attempts to use a relative URL in conjunction with 0, 1
>> and 2 absolute URLs to exercise all the variants of the designed behaviour.
>
> Yeah, that is what I figured, thus the disclaimer. I just thought someone
> might pipe in and say that it was useful for some specific issue. I'll trim
> it down.

Thanks.

>>
>> > [[[
>> > Implement repository root relative url support for the svn command-line
>> > client.
>>
>> I'd add just another sentence or two explaining what "repository-relative URL
>> support" means/is/does, since this is a new concept being introduced.
>>
>>
>>
>> > Create some infrastructure:
>> >
>> > * subversion/libsvn_subr/path.c
>> > (svn_path_is_relative_url): New function to test whether a given string is a
>> > relative url.
>> > (svn_path_resolve_relative_url): New function to resolve a relative url
>> > given the repository root url.
>> >
>> > * subversion/include/svn_path.h
>> > (svn_path_is_relative_url): New function prototype.
>> > (svn_path_resolve_relative_url): New function prototype.
>> >
>> > * subversion/tests/libsvn_subr/path-test.c
>> > (test_is_relative_url): New test.
>> > (test_resolve_relative_url): New test.
>> > (test_funcs): Run new tests.
>> >
>> > * subversion/libsvn_subr/opt.c
>> > (svn_opt_args_to_target_array3): Allow a bare relative url for the root of a
>> > repository ('^/') by testing whether a given argument is a relative url
>> > before attempting to canonicalize it.
>>
>> What exactly is this change, though?
>>
>> You're making the function continue to process full URLs as it did, including
>> uri_from_iri, autoescape, well-formedness check, and canonicalise. But you're
>> making it bypass all but the first of these for repository-relative URLs, with
>> a comment saying "they will be processed later". This leads the user of this
>> function to ask, "How should I process these later?"
>>
>> In fact, I can't see that you are processing these later.
>>
>> Wouldn't it be better to have as much consistency with the processing of full
>> URLs as possible?
>>
>> Why bypass the auto-escape? If it's only to avoid messing up the "^" character
>> then just auto-escape the rest of it.
>>
>> I see that you require a trailing slash to remain on a bare "^/" argument, so
>> you bypass the canonicalisation, but in fact you could special-case this and
>> still do the rest of canonicalisation if that's what's needed.
>>
>> And whatever it does, you need to describe in the function's doc string.
>
> Yeah I see what you are saying, it is a bit confusing, I'll clean it up. It
> gets processed in the svn_path_resolve_relative_url() function.

I don't think the bypassed processing stages do happen in
svn_path_resolve_relative_url() - see my comments on that function, below.

> But of course
> there is no guarantee that it HAS to be passed to that function. I'll look
> into parsing the rest of the canonicalization, that possibility of doing that
> hadn't really occurred to me.

Cool.

[...]
>> > Index: subversion/include/svn_path.h
>> > ===================================================================
>> > --- subversion/include/svn_path.h (revision 29310)
>> > +++ subversion/include/svn_path.h (working copy)
>> > @@ -456,9 +456,28 @@
>> > /** Return TRUE iff @a path looks like a valid absolute URL. */
>> > svn_boolean_t svn_path_is_url(const char *path);
>> >
>> > +/**
>> > + * Return true iff @a url is a relative URL. Specifically that it starts
>> > + * with the characters "^/".
>> > + */
>> > +svn_boolean_t svn_path_is_relative_url(const char *url);
>> > +
>> > /** Return @c TRUE iff @a path is URI-safe, @c FALSE otherwise. */
>> > svn_boolean_t svn_path_is_uri_safe(const char *path);
>> >
>> > +/**
>> > + * Convert the possibly relative url in @a relative_url to an absolute
>> > + * url using @a repos_root_url and stick it in @a absolute_url. If the
>> > + * @a relative_url starts with the characters '^/', they are replaced by
>> > + * the repository root url, otherwise the @a relative_url is returned.
>> > + * Backpaths ("../") within the URL are also supported.
>> > + */
>>
>> "Backpaths...supported": Why do we need this? There's no need to be able to
>> backtrack within a repository, as we're always starting at its root. The only
>> ability backtracking could bring is, starting with a URL for one repository, to
>> construct a URL that points to a different repository". The code currently
>> allows it but I don't believe we want to do that with this feature. (It feels
>> to me a bit yucky to mix URL-of-repository and path-inside-repository into one
>> continuum that can be traversed up and down arbitrarily.)
>>
>> (Also, looking at the backtracking code in resolve_relative_external_url() that
>> you copied, it silently ignores ".." components, which I would have thought was
>> a bug.)
>
> Hum, isn't the apr_array_pop(base_components) where
> resolve_relative_external_url() does something?

Sorry, I missed a few words! I meant to say it silently ignores ".." components
that backtrack outside the allowable scope:
[[[
           if (0 == strcmp("..", component))
             {
               /* Constructing the final absolute URL together with
                  apr_uri_unparse() requires that the path be absolute,
                  so only pop a component if the component being popped
                  is not the component for the root directory. */
               if (base_components->nelts > 1)
                 apr_array_pop(base_components);
             }
           else
             APR_ARRAY_PUSH(base_components, const char *) = component;
]]]

The inner "if" statement ought to end with "else { throw an error }" or
something. I guess I should raise a bug report against that code.

>> > +svn_error_t *
>> > +svn_path_resolve_relative_url(const char **absolute_url,
>> > + const char *relative_url,
>> > + const char *repos_root_url,
>> > + apr_pool_t *pool);
>>
>> These two are not general path functions but command-line-client-specific
>> functions. Please make them private functions in the source file that needs
>> them. We can always make them public later if we need to.
>
> I really thought it fit in nicely with functions such as
> svn_path_get_absolute(). I also think there is value in having it public so
> that other clients can use the same argument (in this case relative url)
> parsing code that is used for svn, in much the same way that
> svn_opt_args_to_target_array3() is a public function. I could definetly see
> where it would make sense to include relative url support in clients such as
> mucc, is there any reason they should re-implement this logic?

The svn_path_... functions act on paths in general, both local paths and URLs,
the sort of paths that all Subversion clients and servers need to deal with.

The svn_opt_... functions are specific to command-line clients. (Logically they
should be in a higher level library than libsvn_subr, but at least they've got
their own header file.)

The new "repository-relative URL syntax" functions are specific to command-line
clients. Since they are needed by svn_opt_args_to_target_array3(), they will
have to be in libsvn_subr. If public, they should be in the svn_opt name space.

I'm not convinced that this new API should be exposed separately at this early
stage. It already will get used by clients that use
svn_opt_args_to_target_array3(). Exposing all the possible APIs at an early
stage is harmful to maintainability of the system in the long run. Please let's
implement the functionality first and then consider making it public later, if
and when other clients such as Mucc really do want it.

(If the resolve_relative_url() function goes public, it could do with being
documented a little more precisely, including the form
(URI/IRI/canonicalised/...) of the inputs and output, and the use of its "pool"
argument. This is less important while it's private.)

>> > -
>> > svn_error_t *
>> > -svn_cl__args_to_target_array_print_reserved(apr_array_header_t **targets,
>> > - apr_getopt_t *os,
>> > - apr_array_header_t *known_targets,
>> > - apr_pool_t *pool)
>> > +svn_cl__args_to_target_array(apr_array_header_t **targets_ret,
>> > + apr_getopt_t *os,
>> > + apr_array_header_t *known_targets,
>> > + svn_client_ctx_t *ctx,
>> > + apr_pool_t *pool)
>> > {
>> > - svn_error_t *error = svn_opt_args_to_target_array3(targets, os,
>> > - known_targets, pool);
>> > + int i;
>> > + const char *root_url = NULL;
>> > + apr_array_header_t *targets;
>> > + svn_error_t *error;
>> > +
>> > + error = svn_opt_args_to_target_array3(&targets, os, known_targets, pool);
>> > +
>> > if (error)
>> > {
>> > if (error->apr_err == SVN_ERR_RESERVED_FILENAME_SPECIFIED)
>> > @@ -1028,10 +1033,96 @@
>> > else
>> > return error;
>> > }
>> > +
>> > + /*
>> > + * First determine whether to even bother with root urls,
>> > + * at least one relative url for it to matter.
>> > + */
>> > + for (i = 0; i < targets->nelts; i++)
>> > + {
>> > + const char *target = APR_ARRAY_IDX(targets, i, const char *);
>> > +
>> > + if (svn_path_is_relative_url(target))
>> > + break;
>> > + }
>> > +
>> > + /* Was a relative url found? */
>> > + if (i >= targets->nelts)
>> > + {
>> > + *targets_ret = targets;
>> > + return SVN_NO_ERROR;
>> > + }
>> > +
>> > + /*
>> > + * Now determine the common root url to use.
>> > + */
>> > + for (i = 0; i < targets->nelts; i++)
>> > + {
>> > + const char *target = APR_ARRAY_IDX(targets, i, const char *);
>> > + const char *tmp_root_url;
>> > +
>> > + if (svn_path_is_relative_url(target))
>> > + continue;
>> > +
>> > + if ((error = svn_client_root_url_from_path(&tmp_root_url, target,
>> > + ctx, pool)))
>> > + {
>> > + if ( (error->apr_err == SVN_ERR_ENTRY_NOT_FOUND)
>> > + || (error->apr_err == SVN_ERR_WC_NOT_DIRECTORY))
>> > + {
>> > + svn_error_clear(error);
>> > + continue;
>> > + }
>> > + else
>> > + return error;
>> > + }
>> > + else if ( (root_url != NULL)
>> > + && (strcmp(root_url, tmp_root_url) != 0))
>> > + return svn_error_createf(SVN_ERR_ILLEGAL_TARGET, NULL,
>> > + _("All non-relative paths must have the same root url.\n"));
>> > + else
>> > + root_url = tmp_root_url;
>> > + }
>> > +
>> > + /*
>> > + * Use the current directory's root url if one wasn't found using the
>> > + * arguments.
>> > + */
>> > + if (root_url == NULL)
>> > + {
>> > + if ((error = svn_client_root_url_from_path(&root_url,
>> > + svn_path_canonicalize(".", pool),
>> > + ctx, pool)))
>> > + return error;
>> > + }
>> > +
>> > + /*
>> > + * Finally, resolve any relative urls found in the targets array.
>> > + */
>> > +
>> > + *targets_ret = apr_array_make(pool, targets->nelts, sizeof(const char *));
>> > +
>> > + for (i = 0; i < targets->nelts; i++)
>> > + {
>> > + const char *target = APR_ARRAY_IDX(targets, i, const char *);
>> > +
>> > + if (svn_path_is_relative_url(target))
>> > + {
>> > + const char *abs_target;
>> > +
>> > + SVN_ERR(svn_path_resolve_relative_url(&abs_target, target,
>> > + root_url, pool));
>> > + APR_ARRAY_PUSH(*targets_ret, const char *) = abs_target;
>> > + }
>> > + else
>> > + {
>> > + APR_ARRAY_PUSH(*targets_ret, const char *) = target;
>> > + }
>> > + }
>> > +
>> > return SVN_NO_ERROR;
>> > }
>>
>> Is that the function in which we might expect canonicalisation and
>> auto-escaping to be done?
>
> I believe this is done through the use of apr_uri_unparse() inside the
> svn_path_resolve_relative_url() function.

The apr_uri_parse/unparse() functions are basically just splitting and joining
the various parts of the URL, not doing the (Subversion-specific)
auto-escaping, nor (as far as I know) canonicalisation steps that you bypassed
in svn_opt_args_to_target_array3().

I look forward to seeing the next iteration of this feature. I am holding off
making any detailed comments on the code until we get the main structure
settled. Let us know if there's any more I/we can help towards it in the mean time.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-02-27 17:51:57 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.