Re: [PATCH] Implicit detection of merge source URL and merge revision range

From: Daniel Rall <dlr_at_collab.net>
Date: 2007-04-13 23:22:10 CEST

Kamesh, thanks for the great start! I have some comments below, some
of which I've responded to myself with revisions to your patch (new
version attached).

Currently, my version of the patch doesn't pass the merge test suite.

On Thu, 12 Apr 2007, Kamesh Jayachandran wrote:
...
> --- subversion/svn/main.c (revision 24527)
> +++ subversion/svn/main.c (working copy)
> @@ -513,7 +513,7 @@
> ("Apply the differences between two sources to a working copy path.\n"
> "usage: 1. merge sourceURL1[@N] sourceURL2[@M] [WCPATH]\n"
> " 2. merge sourceWCPATH1@N sourceWCPATH2@M [WCPATH]\n"
> - " 3. merge [-c M | -r N:M] SOURCE[@REV] [WCPATH]\n"
> + " 3. merge [-c M | -r N:M] [SOURCE[@REV]] [WCPATH]\n"

I'd expect to this to be:

merge [-c M | -r N:M] [SOURCE[@REV] [WCPATH]]

...unless we're going to allow you to specify 'merge -c N "" WCPATH'.

I wonder if we should require a separate option to get this behavior
from the command-line client. Personally, I'm pretty happy with this,
but I worry that some people might hit return too soon and shoot
themselves in the foot.

...
> - " 3. In the third form, SOURCE can be a URL, or working copy item\n"
> - " in which case the corresponding URL is used. This URL in\n"
> + " 3. In the third form, SOURCE should be a URL if given. \n"

Since SOURCE can be a WCPATH, the above wording needs to be changed.

> + " If SOURCE is not given SOURCE URL is deduced from the \n"
> + " copy source URL of WCPATH. If the WCPATH is not a copy of some \n"
> + " other node error is thrown asking for URL. This URL in\n"

We should talk about SOURCE here, rather than WCPATH.

> " revision REV is compared as it existed between revisions N and \n"
> " M. If REV is not specified, HEAD is assumed.\n"
> " The '-c M' option is equivalent to '-r N:M' where N = M-1.\n"
> " Using '-c -M' does the reverse: '-r M:N' where N = M-1.\n"
> + " If revision range is not specified, it is assumed to be \n"
> + " -r oldest_source_node_rev:HEAD.\n"

I don't have a great alternate suggestion, but think that
"oldest_source_node_rev" should be formatted differently.

...
> --- subversion/svn/merge-cmd.c (revision 24527)
> +++ subversion/svn/merge-cmd.c (working copy)
> @@ -43,12 +43,36 @@
> svn_cl__opt_state_t *opt_state = ((svn_cl__cmd_baton_t *) baton)->opt_state;
> svn_client_ctx_t *ctx = ((svn_cl__cmd_baton_t *) baton)->ctx;
> apr_array_header_t *targets;
> - const char *sourcepath1, *sourcepath2, *targetpath;
> + const char *sourcepath1=NULL, *sourcepath2=NULL;
> + /* where to apply the diffs, defaulting to '.' */
> + const char *targetpath="";

Need better use of whitespace. Don't need to put source/target path
variables on their own line.

Do we want to default target_path to "" here, or later on?

> svn_boolean_t using_rev_range_syntax = FALSE;
> svn_error_t *err;
> - svn_opt_revision_t peg_revision;
> + svn_opt_revision_t peg_revision, peg_revision1, peg_revision2;

Do we need three peg revision variables? In the
using_rev_range_syntax block, it looks like the two new ones are only
temporary.

...
> + if (targets->nelts <= 1)
> + {
> + using_rev_range_syntax = TRUE;
> + }
> + else if (targets->nelts == 2)
> + {
> + /* First target should be URL and second target should be WCPATH.*/
> + if (svn_path_is_url(sourcepath1) && !svn_path_is_url(sourcepath2))
> + using_rev_range_syntax = TRUE;
> + }

This "if/else" block could be one big conditional.

...
> + peg_revision = peg_revision1;

I think we can do away with this statement by removing the two new peg
rev variables.

...
> + else
> + {
> + /* targets->nelts is 1 or 2 here. */
> + if (targets->nelts == 1)
> + sourcepath2 = sourcepath1;
> + /* Set the default peg revision if one was not specified. */
> + if (peg_revision.kind == svn_opt_revision_unspecified)
> + peg_revision.kind = svn_path_is_url(sourcepath1)
> + ? svn_opt_revision_head : svn_opt_revision_working;
> + }
> if (targets->nelts == 2)
> - targetpath = APR_ARRAY_IDX(targets, 1, const char *);
> - else
> - targetpath = "";
> + {
> + targetpath = APR_ARRAY_IDX(targets, 1, const char *);
> + }

We can put the setting of targetpath up inside the "else" block. We
can avoid setting it to "" if that is its default value.

> }
> else /* using @rev syntax */
> {
> @@ -96,14 +119,6 @@
> return svn_error_create(SVN_ERR_CL_ARG_PARSING_ERROR, NULL,
> _("Too many arguments given"));
>
> - /* the first two paths become the 'sources' */
> - SVN_ERR(svn_opt_parse_path(&opt_state->start_revision, &sourcepath1,
> - APR_ARRAY_IDX(targets, 0, const char *),
> - pool));
> - SVN_ERR(svn_opt_parse_path(&opt_state->end_revision, &sourcepath2,
> - APR_ARRAY_IDX(targets, 1, const char *),
> - pool));

Hrm. Can we really remove this block? What then will set
opt_state->start_revision and opt_state->end_revision?

...
> @@ -123,10 +138,9 @@
> else
> targetpath = "";
> }
> -
> /* If no targetpath was specified, see if we can infer it from the
> sourcepaths. */
> - if (! strcmp(targetpath, ""))
> + if (sourcepath1 && sourcepath2 && ! strcmp(targetpath, ""))

Prefer the expression "strcmp(...) == 0".

> {
> char *sp1_basename, *sp2_basename;
> sp1_basename = svn_path_basename(sourcepath1, pool);
> @@ -144,10 +158,9 @@
> }
> }
>
> - if (opt_state->start_revision.kind == svn_opt_revision_unspecified)
> + if ((opt_state->start_revision.kind == svn_opt_revision_unspecified)
> + && !using_rev_range_syntax)
> opt_state->start_revision.kind = svn_opt_revision_head;
> - if (opt_state->end_revision.kind == svn_opt_revision_unspecified)
> - opt_state->end_revision.kind = svn_opt_revision_head;

I don't think we can get rid of this second check in the
non-using_rev_range_syntax case, either.

...
> --- subversion/libsvn_client/client.h (revision 24527)
> +++ subversion/libsvn_client/client.h (working copy)
> @@ -887,6 +887,34 @@
> svn_client_ctx_t *ctx,
> apr_pool_t *pool);

You probably didn't mean to put these API declarations into the
"Externals (Modules)" area of client.h. They should probably go into
a merge or copy-related area.

They're also using Doxygen markup, and formatting inconsistent with
the rest of the APIs in this header.

> +/** Retrieve the oldest revision of the node at @a REL_PATH at @a REV
> + * since it was last copied (if applicable), and store it in @a
> + * OLDEST_REV.
> + * If REL_PATH does not exist in that REV, sets *OLDEST_REV to
> + * SVN_INVALID_REVNUM.
> + *
> + */
> +svn_error_t *
> +svn_client__oldest_rev_at_path(svn_revnum_t *oldest_rev,
> + svn_ra_session_t *ra_session,
> + const char *rel_path,
> + svn_revnum_t rev,
> + apr_pool_t *pool);
> +
> +
> +/** Retrieve the copy source the node at @a REL_PATH at @a REV
                               ^
                               of

> + * and store it in @a COPY_SOURCE. If REL_PATH is not a copy or
> + * non-existent @a REV sets @a COPY_SOURCE to NULL.

This reads a little funky, too.

> + *
> + */
> +svn_error_t *
> +svn_client__get_copy_source(const char **copy_source,
> + svn_ra_session_t *ra_session,
> + const char *rel_path,
> + svn_revnum_t rev,
> + apr_pool_t *pool);

There's a mailing list thread about these APIs that I need to follow
up on. We'll probably also want to bring existing merge sources into
consideration where we use the svn_client__get_copy_source() API
below.

...
> --- subversion/libsvn_client/log.c (revision 24527)
> +++ subversion/libsvn_client/log.c (working copy)
...
> +/* A log callback conforming to the svn_log_message_receiver_t
> + interface for obtaining the last revision of a node at a path and
> + storing it in *BATON (an svn_revnum_t). */
> +static svn_error_t *
> +revnum_receiver(void *baton,
> + apr_hash_t *changed_paths,
> + svn_revnum_t revision,
> + const char *author,
> + const char *date,
> + const char *message,
> + apr_pool_t *pool)
> +{
> + *((svn_revnum_t *) baton) = revision;
> + return SVN_NO_ERROR;
> +}
> +
> +svn_error_t *
> +svn_client__oldest_rev_at_path(svn_revnum_t *oldest_rev,
> + svn_ra_session_t *ra_session,
> + const char *rel_path,
> + svn_revnum_t rev,
> + apr_pool_t *pool)
> +{
> + svn_error_t *err;
> + *oldest_rev = SVN_INVALID_REVNUM;
> + apr_array_header_t *rel_paths = apr_array_make(pool, 1, sizeof(rel_path));
> + APR_ARRAY_PUSH(rel_paths, const char *) = rel_path;
> +
> + /* Trace back in history to find the revision at which this node
> + was created (copied or added). */
> + err = svn_ra_get_log(ra_session, rel_paths, 1, rev, 1, FALSE, TRUE,
> + revnum_receiver, oldest_rev, pool);
> + if (err && (err->apr_err == SVN_ERR_FS_NOT_FOUND ||
> + err->apr_err == SVN_ERR_RA_DAV_REQUEST_FAILED))
> + {
> + /* A locally-added but uncommitted versioned resource won't
> + exist in the repository. */
> + svn_error_clear(err);
> + err = SVN_NO_ERROR;
> + }
> + return err;
> +}
> +
> +struct copy_source_baton
> +{
> + const char **copy_source;
> + apr_pool_t *pool;
> +};

Doesn't this need a copyfrom_rev field, too? Did you happen to see
the patch I sent to the dev list on "Fri, 30 Mar 2007", attached to
the message with "Subject: [RFC] Identifying copy/move sources", which
supplies a very similar API?

> +/* A log callback conforming to the svn_log_message_receiver_t
> + interface for obtaining the copy source of a node at a path and
> + storing it in *BATON (a struct copy_source_baton *). */
> +static svn_error_t *
> +copy_source_receiver(void *baton,
> + apr_hash_t *changed_paths,
> + svn_revnum_t revision,
> + const char *author,
> + const char *date,
> + const char *message,
> + apr_pool_t *pool)
> +{
> + apr_hash_index_t *hi;
> + void *val;
> + const char *copy_source_path;
> + svn_log_changed_path_t *changed_path;
> + struct copy_source_baton *copy_source_baton;
> + copy_source_baton = baton;
> + /*FIXME: if the rev at which this node is created has few other node
> + changes too extract only our node. */
> + for (hi = apr_hash_first(pool, changed_paths); hi; hi = apr_hash_next(hi))
> + {
> + apr_hash_this(hi, NULL, NULL, &val);
> + changed_path = val;
> + }
> + copy_source_path = changed_path->copyfrom_path;
> +
> + *((char **) copy_source_baton->copy_source) =
> + apr_pstrdup(copy_source_baton->pool, copy_source_path);
> + return SVN_NO_ERROR;
> +}

You probably want to look at my receiver implementation.

...
> --- subversion/libsvn_client/merge.c (revision 24527)
> +++ subversion/libsvn_client/merge.c (working copy)
...
> +/*Assumes the default values of REVISION1 and REVISION2 to be
> + oldest rev at which ra_session's root got created and HEAD
> + respectively if REVISION1 and REVISION2 are unspecified.
> + This assumed value is set at ASSUMED_REVISION1 and ASSUMED_REVISION2.
> + RA_SESSION is used to retrieve the value of current HEAD revision.
> + POOL is used for temporary allocations.
> +*/

Doc string wording could be improved a bit. For some reason, this
implementation strikes me as overly complex, but I can't put my finger
on how to do it more simply.

> +static svn_error_t *
> +assume_default_rev_range(const svn_opt_revision_t *revision1,
> + svn_opt_revision_t *assumed_revision1,
> + const svn_opt_revision_t *revision2,
> + svn_opt_revision_t *assumed_revision2,
> + svn_ra_session_t *ra_session,
> + apr_pool_t *pool)
> +{
> + svn_opt_revision_t head_rev_opt;
> + svn_revnum_t head_revnum = SVN_INVALID_REVNUM;
> + head_rev_opt.kind = svn_opt_revision_head;
> + /* Provide reasonable defaults for unspecified revisions. */
> + if (revision1->kind == svn_opt_revision_unspecified)
> + {
> + SVN_ERR(svn_client__get_revision_number(&head_revnum, ra_session,
> + &head_rev_opt, "", pool));
> + SVN_ERR(svn_client__oldest_rev_at_path(&assumed_revision1->value.number,
> + ra_session, "",
> + head_revnum,
> + pool));
> + if (assumed_revision1->value.number != SVN_INVALID_REVNUM)

The SVN_IS_VALID_REVNUM() macro would be more appropriate here.

> + {
> + assumed_revision1->kind = svn_opt_revision_number;
> + }
> + }
> + else
> + {
> + *assumed_revision1 = *revision1;
> + }
> + if (revision2->kind == svn_opt_revision_unspecified)
> + {
> + if (head_revnum == SVN_INVALID_REVNUM)

...and here (reversing the blocks).

> + assumed_revision2->kind = svn_opt_revision_head;
> + else
> + {
> + assumed_revision2->value.number = head_revnum;
> + assumed_revision2->kind = svn_opt_revision_number;
> + }
> + }
> + else
> + {
> + *assumed_revision2 = *revision2;
> + }
> + return SVN_NO_ERROR;
...
> @@ -1709,17 +1771,30 @@
> svn_opt_revision_t *revision1, *revision2;
> svn_error_t *err;
> svn_merge_range_t range;
> - svn_ra_session_t *ra_session1, *ra_session2;
> + svn_ra_session_t *ra_session, *ra_session1, *ra_session2;

Rather than introducing a third ra_session, can't we just re-parent
ra_session1 after using it to determine the assumed revision range?

...
> + /* Establish first RA session to URL1. */
> + /*FIXME: use initial_URL1 for now which may barf for pegged rev merges */

What are these FIXME's all about? Can you give more context?

> + SVN_ERR(svn_client__open_ra_session_internal(&ra_session, initial_URL1, NULL,
> + NULL, NULL, FALSE, TRUE,
> + ctx, pool));
> + SVN_ERR(assume_default_rev_range(initial_revision1,
> + &assumed_initial_revision1,
> + initial_revision2,
> + &assumed_initial_revision2,
> + ra_session,
> + pool));
...
> @@ -2226,22 +2301,51 @@
> svn_wc_adm_access_t *adm_access;
> const svn_wc_entry_t *entry;
> struct merge_cmd_baton merge_cmd_baton;
> - const char *URL;
> + const char *URL = NULL;

There's only one spot where we use this default value, so let's set it
there instead.

...
> +
> + {
> +
> +
> +
> +
> +
> +
> + */
> +
> + }
> + else
> + {
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
> +
if (source) /* If source is a path, we need to get the underlying URL * from the wc and save the initial path we were passed so we can use it as * a path parameter (either in the baton or not). otherwise, the path * will just be NULL, which means we won't be able to figure out some kind * of revision specifications, but in that case it won't matter, because * those ways of specifying a revision are meaningless for a url. SVN_ERR(svn_client_url_from_path(&URL, source, pool)); /* If a merge source was not specified, try to derive it from copy source. */ svn_ra_session_t *ra_session; svn_revnum_t working_rev_num; svn_opt_revision_t working_copy_rev_num_opt; working_copy_rev_num_opt.kind = svn_opt_revision_working; const char *working_copy_url; const char *repos_root; const char *copy_source_path; SVN_ERR(svn_client__ra_session_from_path(&ra_session, &working_rev_num, &working_copy_url, target_wcpath, &working_copy_rev_num_opt, &working_copy_rev_num_opt,

We want to use this same working_copy_rev_num_opt variable twice?

> + ctx,
> + pool));
> + SVN_ERR(svn_ra_get_repos_root(ra_session, &repos_root, pool));
> + SVN_ERR(svn_client__get_copy_source(&copy_source_path, ra_session,
> + "", working_rev_num, pool));
> + if (copy_source_path)

We can reduce the scope of repos_root to this block.

> + URL = apr_pstrcat(pool, repos_root, copy_source_path, NULL);

else
URL = NULL;

> + }
> +
> if (! URL)
> return svn_error_createf(SVN_ERR_ENTRY_MISSING_URL, NULL,
> - _("'%s' has no URL"),
> - svn_path_local_style(source, pool));
> + _("'%s' has no implicit URL to merge from"),
> + source ? svn_path_local_style(source, pool) : ".");

Hmm. It seems like we could have either the old problem (mising URL
field in entry file), or the new problem (unable to automatically
determine a merge source URL). In the latter case, this isn't the
appropriate error code. We should use two separate error creation
statement here, one in the "if" block and one in the "else" block.

...
> --- subversion/tests/cmdline/merge_tests.py (revision 24527)
> +++ subversion/tests/cmdline/merge_tests.py (working copy)
> @@ -1243,7 +1243,7 @@
> def merge_with_implicit_target_helper(sbox, arg_flav):
> "ARG_FLAV is one of 'r' (revision range) or 'c' (single change)."
>
> - if arg_flav not in ('r', 'c'):
> + if arg_flav not in ('r', 'c', '*'):
> raise svntest.Failure("Unrecognized flavor of merge argument")
>
> sbox.build()
> @@ -1294,9 +1294,15 @@
> svntest.actions.run_and_verify_svn(None, ['U mu\n'], [],
> 'merge', '-c', '-2', mu_url)
>
> + #implicit rev and url detection is for forward merge only.
> + elif arg_flav == '*':
> + svntest.actions.run_and_verify_svn(None, ['U mu\n'], [],
> + 'merge', '-c', '-2', mu_url)

We should document this in the command-line client. In the test case,
we should skip the '*' flavor, since we're already have tested it for
the 'r' and 'c' flavors.

> # sanity-check resulting file
> if (svntest.tree.get_text('mu') != orig_mu_text):
> - raise svntest.Failure("Unexpected text in 'mu'")
> + raise svntest.Failure("Unexpected text '%s' in 'mu', expected '%s'" %
> + (svntest.tree.get_text('mu'), orig_mu_text))
>
> # merge using filename for sourcepath
> # Cannot use run_and_verify_merge with a file target
> @@ -1306,6 +1312,9 @@
> elif arg_flav == 'c':
> svntest.actions.run_and_verify_svn(None, ['G mu\n'], [],
> 'merge', '-c', '2', 'mu')
> + elif arg_flav == '*':
> + svntest.actions.run_and_verify_svn(None, ['G mu\n'], [],
> + 'merge', 'mu')
>
> # sanity-check resulting file
> if (svntest.tree.get_text('mu') != orig_mu_text + added_mu_text):
> @@ -1322,6 +1331,11 @@
> "merging a file w/no explicit target path using -c"
> merge_with_implicit_target_helper(sbox, 'c')
>
> +def merge_with_implicit_target_and_revs(sbox):
> + "merging a file w/no explicit target path or revs"
> + merge_with_implicit_target_helper(sbox, '*')
> +
> +
> #----------------------------------------------------------------------
>
> def merge_with_prev (sbox):
> @@ -5370,6 +5384,7 @@
> simple_property_merges,
> merge_with_implicit_target_using_r,
> merge_with_implicit_target_using_c,
> + merge_with_implicit_target_and_revs,

I seem to recall a second existing test case which could house this
type of regression testing...merge_one_file_helper() it is. An
additional consumer -- a la merge_one_file_using_r() and
merge_one_file_using_c() -- could be added. (I didn't do this.)

text/plain attachment: automerge-v7.patch

application/pgp-signature attachment: stored

Received on Fri Apr 13 23:22:40 2007

This message: [ Message body ]
Next message: Daniel Rall: "Re: svn commit: r24559 - trunk/subversion/svnadmin"
Previous message: Bastiaan Veelo: "[PATCH][hook script]] Reduce false positives in detect-merge-conflicts.sh"
In reply to: Kamesh Jayachandran: "[PATCH] Implicit detection of merge source URL and merge revision range"
Next in thread: Paul Burba: "Re: [PATCH] Implicit detection of merge source URL and merge revision range"
Reply: Paul Burba: "Re: [PATCH] Implicit detection of merge source URL and merge revision range"
Reply: Kamesh Jayachandran: "Re: [PATCH] Implicit detection of merge source URL and merge revision range"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]