[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn commit: r31504 - trunk/subversion/libsvn_client

From: Karl Fogel <kfogel_at_red-bean.com>
Date: Sun, 22 Jun 2008 18:19:10 -0400

cmpilato_at_tigris.org writes:
> Log:
> Optimize file merges performed with ancestrally related sources by
> using the svn_ra_get_log2() interface to figure out in which revisions
> the source really changed, and using that information to filter out
> no-op revision ranges prior to merging. We still *record* the
> original set of revision ranges, of course.
>
> * subversion/libsvn_client/merge.c
> (log_changed_revs, remove_noop_merge_ranges): New functions.
> (do_file_merge): Selective use remove_noop_merge_ranges() to reduce
> the amount of work this function needs to do.
>
> --- trunk/subversion/libsvn_client/merge.c (r31503)
> +++ trunk/subversion/libsvn_client/merge.c (r31504)
> @@ -3543,6 +3543,89 @@ get_mergeinfo_paths(apr_array_header_t *
> }
>
>
> +/* Implements the svn_log_entry_receiver_t interface. */
> +static svn_error_t *
> +log_changed_revs(void *baton,
> + svn_log_entry_t *log_entry,
> + apr_pool_t *pool)
> +{
> + apr_array_header_t *revs = baton;
> + svn_revnum_t *revision = apr_palloc(revs->pool, sizeof(*revision));
> + *revision = log_entry->revision;
> + APR_ARRAY_PUSH(revs, svn_revnum_t *) = revision;
> + return SVN_NO_ERROR;
> +}

We should also document what the function actually does -- the fact that
it implements a type is important, but that's not all there is to say
(we have to say what BATON is, for example). I committed a doc string
tweak in r31845.

> +/* Set *OPERATIVE_RANGES_P to an array of svn_merge_range_t * merge
> + range objects copied wholesale from RANGES which have the property
> + that in some revision within that range the object identified by
> + RA_SESSION was modified (if by "modified" we mean "'svn log' would
> + return that revision). *OPERATIVE_RANGES_P is allocated from the
> + same pool as RANGES, and the ranges within it are shared with
> + RANGES, too. Use POOL for temporary allocations. */
> +static svn_error_t *
> +remove_noop_merge_ranges(apr_array_header_t **operative_ranges_p,
> + svn_ra_session_t *ra_session,
> + apr_array_header_t *ranges,
> + apr_pool_t *pool)
> +{
> + int i;
> + svn_revnum_t oldest_rev = SVN_INVALID_REVNUM;
> + svn_revnum_t youngest_rev = SVN_INVALID_REVNUM;
> + apr_array_header_t *changed_revs =
> + apr_array_make(pool, ranges->nelts, sizeof(svn_revnum_t *));
> + apr_array_header_t *operative_ranges =
> + apr_array_make(ranges->pool, ranges->nelts, ranges->elt_size);
> + apr_array_header_t *log_targets =
> + apr_array_make(pool, 1, sizeof(const char *));
> + APR_ARRAY_PUSH(log_targets, const char *) = "";
> +
> + /* Find the revision extremes of the RANGES we have. */
> + for (i = 0; i < ranges->nelts; i++)
> + {
> + svn_merge_range_t *r = APR_ARRAY_IDX(ranges, i, svn_merge_range_t *);
> + svn_revnum_t max_rev = MAX(r->start, r->end);
> + svn_revnum_t min_rev = MIN(r->start, r->end) + 1;
> +
> + if ((! SVN_IS_VALID_REVNUM(youngest_rev)) || (max_rev > youngest_rev))
> + youngest_rev = max_rev;
> + if ((! SVN_IS_VALID_REVNUM(oldest_rev)) || (min_rev < oldest_rev))
> + oldest_rev = min_rev;
> + }
> +
> + /* Get logs across those ranges, recording which revisions hold
> + changes to our object's history. */
> + SVN_ERR(svn_ra_get_log2(ra_session, log_targets, youngest_rev,
> + oldest_rev, 0, FALSE, FALSE, FALSE,
> + apr_array_make(pool, 0, sizeof(const char *)),
> + log_changed_revs, changed_revs, pool));
> +
> + /* Now, copy from RANGES to *OPERATIVE_RANGES, filtering out ranges
> + that aren't operative (by virtue of not having any revisions
> + represented in the CHANGED_REVS array). */
> + for (i = 0; i < ranges->nelts; i++)
> + {
> + svn_merge_range_t *range = APR_ARRAY_IDX(ranges, i, svn_merge_range_t *);
> + int j;
> +
> + for (j = 0; j < changed_revs->nelts; j++)
> + {
> + svn_revnum_t *changed_rev =
> + APR_ARRAY_IDX(changed_revs, j, svn_revnum_t *);
> + if ((*changed_rev > MIN(range->start, range->end))
> + && (*changed_rev <= MAX(range->start, range->end)))
> + {
> + APR_ARRAY_PUSH(operative_ranges, svn_merge_range_t *) = range;
> + break;
> + }
> + }
> + }
> + *operative_ranges_p = operative_ranges;
> + return SVN_NO_ERROR;
> +}

Hmmm, a question about that final (innermost) for-loop:

   for (j = 0; j < changed_revs->nelts; j++)
   {
      ...
   }

Do we really have to loop over (on average) half of changed_revs each
time to evaluate whether or not the range in question is operative?
After all, changed_revs is in sorted order already -- I don't know which
way, but svn_ra_get_log2()'s doc string has details on that. So instead
of looping every time, we could just compare

    APR_ARRAY_IDX(changed_revs, 0, svn_revnum_t *)
    APR_ARRAY_IDX(changed_revs, changed_revs->nelts, svn_revnum_t *)

against the appropriate MIN|MAX of (range->start, range->end), and know
in constant time whether we're even interested in pursuing it further.
And once we know we *are* interested in pursuing it further -- i.e.,
that there could be a changed_rev that would make this range interesting
-- *then* we could loop, or we could be really fancy and binary search.

I'm hand-waving some on the details here, but I think you get the idea?

> /*-----------------------------------------------------------------------*/
>
> /*** Merge Source Normalization ***/
> @@ -4104,7 +4187,28 @@ do_file_merge(const char *url1,
>
> if (!merge_b->record_only)
> {
> - for (i = 0; i < remaining_ranges->nelts; i++)
> + apr_array_header_t *ranges_to_merge = remaining_ranges;
> +
> + /* If we have ancestrally related sources and more than one
> + range to merge, eliminate no-op ranges before going through
> + the effort of downloading the many copies of the file
> + required to do these merges (two copies per range). */
> + if (merge_b->sources_ancestral && (remaining_ranges->nelts > 1))
> + {
> + const char *old_sess_url = NULL;
> + SVN_ERR(svn_client__ensure_ra_session_url(&old_sess_url,
> + merge_b->ra_session1,
> + primary_url, subpool));
> + SVN_ERR(remove_noop_merge_ranges(&ranges_to_merge,
> + merge_b->ra_session1,
> + remaining_ranges, subpool));
> + if (old_sess_url)
> + SVN_ERR(svn_ra_reparent(merge_b->ra_session1, old_sess_url,
> + subpool));
> + svn_pool_clear(subpool);
> + }
> +
> + for (i = 0; i < ranges_to_merge->nelts; i++)
> {
> svn_wc_notify_t *n;
> svn_boolean_t header_sent = FALSE;

*nod* Looks like we effectively depend on remove_noop_merge_ranges()
being able to take the same value (or a reference thereto) for input and
output. So I updated that doc as well in r31845.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-06-23 00:19:34 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.