[merge tracking] Proposal to track "novel changes"

From: Michael Haggerty <mhagger_at_alum.mit.edu>
Date: 2006-10-26 00:55:33 CEST

[This topic was discussed with dlr yesterday on IRC; see

http://svn.haxx.se/dev/archive-2006-10/0697.shtml

for a transcript.]

I'm no merge guru, so this could be completely silly (or well-known
technology). Nor have I hacked on the internals of svn, so it could be
totally impractical to implement (I would therefore greatly appreciate
feedback.) And not being an svn hacker, I unfortunately can't "STFU and
write some code" in any timely manner. But here we go...

I think that the merge-tracking scheme that is used by svnmerge.py and
by the proposed merge-tracking extension to SVN will probably work
pretty well for simple merge topologies. But if there are any loops in
the undirected graph of branches between which revisions have been
merged, then it will break down.

For example, if I merge a change from branch B to branch A, and the same
change from branch B to branch C, then there is nothing to stop me
merging the same change from branch A to branch C, resulting in a
repeated merge and a conflicted or even bogus result.

    B
   /|
  A |
   \|
    C

Only in the case that the deltas are identical for B->A and B->C will
the merge code recognize that A->C is a noop. This problem is intrinsic
to merge tracking the way that it has been implemented in svnmerge.py.
Usually it is not noticed because people use a very simple merge
topology (out of fear?).

It seems to me that what one wants to track in a merge-tracking system
is the creation and motion of "novel changes". [On IRC I referred to
these as "original revisions", but I think the new name is less
ambiguous.] I define a novel change to be one created by a developer by
the application of brainpower and emacs (as opposed to a change
resulting from a merge):

novel change == a change made from scratch (not via a merge)

Most of the intellectual content of a project is in the novel changes,
whereas (I'm exaggerating here) merges are mostly a matter of
bookkeeping and tidying up.

As far as an SCM is concerned, a novel change is an indivisible nugget
of functionality, initially embodied as a single changeset applied to a
particular parent revision. The proposal is this:

1. For each commit, record the novel change(s) that it involved.

a. If the commit did not involve a merge, then it is by definition a
novel change.

b. If a commit resulted from merging a set of revisions from another
branch, then the novel change set associated with the new commit is the
union of the novel changes associated with each of the merged-in
revisions. Such a commit is *not* itself considered a novel change,
even if the merge involved the manual repair of merge conflicts.

2. Each version of a branch has a set of accumulated novel changes
consisting of the union of all novel changes associated with all earlier
commits.

3. A novel change is never merged into a branch if it is already in the
branch's set of accumulated novel changes. This restriction also
applies if the novel change is merged indirectly, for example by merging
a revision that itself included the same novel change.

4. If a file is copied, its accumulated-novel-changes are copied as well.

For example, consider the following sequence of events:

1. A novel change is made on branch B, creating revision B:3. Resulting
metadata:

novel-changes[B:3] = {B:3}
accumulated-novel-changes[B:3] = {B:1,B:2,B:3} = {B:1-3}

2. B:3 is merged to branch A, creating A:5. Resulting metadata:

novel-changes[A:5] = {B:3}
accumulated-novel-changes[A:5] = {A:1-4,B:3}

3. B:3 is independently merged to branch C, creating C:10. Resulting
metadata:

novel-changes[C:10] = {B:3}
accumulated-novel-changes[C:10] = {C:1-9,B:3}

The deltas associated with B:3, A:5, and C:10 differ textually, but they
are nevertheless conceptually equivalent, because they are different
representations of the same novel change.

Therefore, if somebody attempts to merge A:5 into branch C, the merge
code should treat it as a noop, because the novel change embodied in A:5
(namely B:3) is already in the accumulated-novel-changes of branch C.
This determination should be made based only on the merge-tracking
metadata, without looking at the diffs at all, and it should not be
considered a conflict (though an informational message would be helpful).

Suppose somebody now wants to add the same novel change into branch D,
creating D:13. She has the choice of whether to merge B:3, A:5, or
C:10. As far as the merge-tracking system is concerned, it makes no
difference because they are just three representations of the same novel
change (namely the one originally made in B:3). But of course the three
diffs are textually different, so the developer should choose which
representation to merge based on which one is most likely to apply
without conflicts (presumably the one that is on the branch that is most
similar to branch D). Regardless of which branch was used as the source
of the merge, the resulting metadata would be

novel-changes[D:13] = {B:3}.
accumulated-novel-changes[D:13] = {D:1-12,B:3}.

This scheme generalizes to merges of multiple revisions at once. If
B:10-12 are novel changes and they are merged into branch A, creating
A:15, then

novel-changes[A:15] = {B:10-12}
accumulated-novel-changes[A:15] = {A:1-4,B:3,A:6-14,B:10-12}.

This scheme can result in merge conflicts that would not be recognized
by other schemes. For example, suppose that I have also merged B:10
into branch C creating C:18 with metadata

novel-changes[C:18] = {B:10}
accumulated-novel-changes[C:18] = {C:1-9,B:3,C:11-17,B:10}

Now an attempt to merge A:15 into C would result in a conflict because
A:15 represents {B:10-12}, but B:10 is already contained in the
accumulated-novel-changes of branch C. (There is no way to know how to
apply only part of revision A:15.)

However, a merge of B:10-12 into C would be OK. B:10 would be skipped
over (or not even listed in the "available-to-merge" list) because it is
already contained in C.

If a revision is reverted (by merging it backwards), then the associated
novel changes are "negated". Thus if B:11 is reverted out of branch A,
creating A:20, then

novel-changes[A:20] = {-B:11}
accumulated-novel-changes[A:20] = {A:1-4,B:3,A:6-14,B:10,B:12,A:16-19}.

An attempt to revert a novel change that was not contained in the
branch's accumulated-novel-changes is a conflict. Therefore, whereas
novel-changes can be negative, accumulated-novel-changes can only
contain "positive" novel changes.

Additional details:

- Of course the merge-tracking metadata would conceptually be recorded
at file rather than a branch resolution, though physically there should
be an inheritance system to prevent an explosion of metadata.

- In these examples I've named novel changes by the branch:revision in
which they first occurred. In reality, any unique id would work. In
fact, to support merging across repositories one could incorporate the
repository's UUID in the novel change IDs.

- There should be a way to manually adjust merge-tracking data (a la
"svnmerge.py merge --record-only") and to override the strict default
behavior.

- It would be nice to include the "svnmerge.py block" feature.

- One might want to record the literal source of a merge in addition to
the novel-changes that were involved. (In other words, keep track of
whether the user above merged B:3 vs. A:5 vs. C:10.)

Advantages of this scheme:

- The merge-tracking metadata deduces the conceptual intention of a
programmers activities, not just the resulting textual changes, with no
extra effort by the programmer.

- Reverts are supported in a straightforward way.

- Double merges, incorrect reverts, etc. are prevented based on the
merge-tracking metadata rather than by fancy merge algorithms.

- Requires roughly the same amount of metadata as the svnmerge.py
scheme, though efficiency might require some database denormalization
resulting in more voluminous metadata. I *think* that a clever script
could infer the novel-change data from svnmerge.py merge-tracking data,
provided said data were self-consistent.

- I think it could be implemented using svn properties, though some
changes to the property system would help.

- Unknown amount of work to implement.

Disadvantages of this scheme:

- Requires some programmer discipline, for example not to include
unrelated novel changes within a merge commit.

- Some of the behavior is rather implicit and might seem confusing (even
though I think the default behavior is consistent with what people want
to do 98% of the time)

- ???

What do people think of this?

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Oct 26 00:56:01 2006

This message: [ Message body ]
Next message: Daniel Rall: "Re: [PATCH][MERGE-TRACKING] Step 2 of repos to repos copy mergeinfo recording"
Previous message: Madan U Sreenivasan: "[PATCH][MERGE-TRACKING] Step 4 of recording copyfrom info in mergeinfo during a repos to repos copy"
Next in thread: Daniel Berlin: "Re: [merge tracking] Proposal to track "novel changes""
Reply: Daniel Berlin: "Re: [merge tracking] Proposal to track "novel changes""
Reply: David James: "Re: [merge tracking] Proposal to track "novel changes""

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]