Merge tracking - a proposal for impl. AS

From: Jacob Refstrup <jacob.refstrup_at_hp.com>
Date: 2004-09-02 18:12:21 CEST

This is a proposal for how to implement merge tracking
with Ancestor Sets (AS) with improved "blame" capability
(being able to blame who merged and who originated the
code).

Repository - "includes" & "excludes" sets:
------------------------------------------

With each node (file or directory) store two immutable
sparse sets recording merge information for a revision
if a merge was performed for that node.

The "includes" set is used to track merges that added one
or more csets; the "excludes" set is used to track merges
that removed one or more csets.

These sets only need to be present for nodes that were
merged in that revision. Furthermore, a child node
(except those with copy history) inherit their parent's
merge sets and hence a child that has the same includes/
excludes sets won't need to store it for that revision.

We won't store complete "includes/excludes" sets per
node. Rather we would only store the newly included/excluded
versions in the sets. The complete sets can be calculated by
examining previous revisions that affected a given node. The
calculation of these sets can be cached on both client
and server side.

Furthermore we often don't need the complete sets. Often
we will only require a partial set that represents a range
of version numbers - e.g. the sets for r1000 to r1149. This
would be the case if we had branched at r1000 and was merging
at r1149.

Calculating complete sets:
--------------------------

To calculate the complete sets for a given node the following
algorithm is used (sets are modelled as variable length bit
fields). Note we are construction the complete set for the
range [s_rev..e_rev] (both inclusive):

1. If the node is not a copy (branch) and has a parent we
   compute the parents sets first.
2. We start with the parents sets (if appropriate). If no
   parent we initialize the sets for this node to empty.
        node.include = 0; /* empty set */
        node.exclude = 0; /* empty set */
3. We set c_rev = e_rev.
4. If c_rev didn't change the node then
        (stop if c_rev == s_rev)
        c_rev--;
        go to step 4.
5. If c_rev has includes and excludes sets for this node we
   update as follows:
        node.include |= node.rev[c_rev].include & ~node.exclude;
        node.exclude |= node.rev[c_rev].exclude & ~node.include;
6. Since c_rev has changed the node:
        node.include |= (1 << c_rev);
7. Go to step 3.

NB. We don't store these sets in the repository - we would only
cache them.

Caching of sets:
----------------
The calculated complete "includes" / "excludes" sets are completely
cachable as their are based on immutable repository information.

Merging:
--------
In this case merging implies that that either the source or
target is a (indirect) copy of the other. Otherwise it doesn't make
sense talking about merge tracking. The revision at which the copy
(branch) was made is called the start revision (s_rev). The revision
of the working copy where the merge is taking place is called e_rev.

For each node that is being merged we calculate the complete
"includes" / "excludes" sets [s_rev..e_rev] for both the merge source
and target.

The following pseudo C code computes what needs to be merged (the sets
are modelled as variable length bitfields):

  merge.includes = 0;
  merge.excludes = 0;
  merge.includes_prompt = 0;
  merge.excludes_prompt = 0;
  for (r = s_rev, r <= e_rev; r++) {
    if (source.excludes & (1 << r)) {
      if (target.includes & (1 << r))
        merge.excludes_prompt |= (1 << r);
      else
        merge.excludes |= (1 << r);
    }
    if (source.includes & (1 << r)) {
      if (target.excludes & (1 << r))
        merge.includes_prompt |= (1 << r);
     else
        merge.includes |= (1 << r);
    }
  }

The resulting sets indicates which revisions needs to be merged -
either included or excluded. The prompt sets represents csets
that the source and target are in "conflict" over the inclusion/
exclusion and where it makes sense to prompt the user.

Assuming that the user finally chooses to merge

merge_final.includes
merge_final.excludes

then that will be stored in the WC ready for the commit:

wc.includes &= ~merge_final.excludes;
wc.includes |= merge_final.includes;
wc.excludes &= ~merge_final.includes;
wc.excludes |= merge_final.excludes;

Blame:
------
In order to be able to accurately "blame" for a given node the
revisions that have merge info need to be treated specially. The
author that did the merge is in essence the integrator and could
be blamed/praised as such.

To track down who wrote the code "blame" would have to then look
at the contributing revisions (from the merge info). Exactly how
to best do this is not clear at this point in time.

Query:
------
Being able to query if a cset (rX) is included would be done by
determining the changed nodes of rX and then determining the
includes/excludes set of those files (using rX..rWC where rWC is
the revision of the WC). If all files are included then rX is
included.

Pros:
-----
Minimizes merge info storage by storing only
- on revisions that have merges
- on nodes who's merge info is different from parent
Allows multiple merges in one commit
- mixed includes and excludes
- multiple merge sources and targets

Cons:
-----
Have to calculate includes/excludes sets per node
- offset by enabling caching results on both client & server

Regards,
- Jacob Refstrup

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Sep 2 18:12:38 2004

This message: [ Message body ]
Next message: Steve Dwire: "[RFC] Possible Working Copy refactoring..."
Previous message: Jani Averbach: "Re: [SVN RFC] Making JavaHL easier to maintain?"
Next in thread: kfogel_at_collab.net: "Re: Merge tracking - a proposal for impl. AS"
Reply: kfogel_at_collab.net: "Re: Merge tracking - a proposal for impl. AS"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]