Re: Symmetry for branching, move tracking and merging

Date: Tue, 10 Mar 2015 17:00:43 +0000

Branko Čibej wrote:
> On 06.03.2015 12:13, Julian Foad wrote:
[...]
>> In order to build a sane merging system, we expect certain symmetries in
>> the data model.
[...]
>> That generalizes to:
>>
>> * uniformity of the difference from branch1_at_r1 to branch2_at_r2
>> for any values of: branch1, r1, branch2, r2
>> where branch1 and branch2 are 'related' (formally: in the same branch family)
>>
>> * diffs obey some (more or less obvious) arithmetic rules such as:
>> diff(A,B) (+) diff(B,C) == diff(A,C)
>> diff(A,B) (+) diff(B,A) == nil
>> and so on
>>
>> Not just roughly but precisely, testably so.
>
> http://darcs.net/Theory/GaneshPatchAlgebra
>
> You've heard of darcs, I take it. :)

Certainly! However, Darcs patch algebra is mostly concerned with how
patches can be re-ordered (commuted, rebased) and additional semantic
patch types (renaming all occurrences of an identifier) and how the
system can take advantage of those properties. That's a step beyond

What I am saying is so fundamental that Darcs and probably all DVCS's
take it as an axiom. It is that we need to define a "versioned state"
(of a branch), which is a "context" in Darcs terminology, and
"difference" between versioned states, in a way that is independent of
which branch(es) and revision(s) we're looking at.

So I was sloppy in saying "some" arithmetic rules. Let me try being a
bit more precise, but still not complete.

(A, B, ...) are States;
(d, e, ...) are Differences, and Differences are directed;
(A:B) means the Difference from A to B;
(A + d) means applying the Difference 'd' to state 'A';
(d & e) means sequential composition of Differences 'd' and then 'e'.

* for any two States A and B
the directed difference A:B exists

* the 'nil' or 'identity' State is denoted '/':
it consists of an empty directory with no properties;
this directory acts as the root directory for the branch;
a State never includes the name or path of this directory

* any State 'A' can be represented by a Difference from nil-state to 'A':
/ + ((/):A) == A

* thus, the set of Differences is a superset of the set of States

* the 'nil' or 'identity' difference is denoted '(:)':
A:A == (:)
A + (:) == A

* every difference is invertible:
A:B is the inverse of B:A
(:) is the inverse of (:)

* the set of states is NOT closed under the operation of applying a
difference:
A + A:B == B, always; but...
A + d => undefined in general

* composition of differences is associative:
((A:B & B:C) & C:D) == (A:B & (B:C & C:D))
((d & e) & f) == (d & (e & f)), iff (d & e) and (e & f) are both defined

* composition of differences is NOT commutative, in general:
(A:B & B:C) != (B:C & A:B)

Here's a clue. There is no Subversion command to output a pure state
of a branch,
in its entirety and free of other information. ('svn export' + 'svn
proplist -v' comes close, but not close enough as we'll see below.)
Nor to output a pure difference. (Adding tree changes to 'svn diff'
would be a start.) Nor to input a pure state or difference. Nor are
there even APIs that exactly serve these purposes.

To really see what's missing, we need to focus on the difficult parts
which include:

* mergeinfo

* copies (what to do with 'copy-from' information)

* move tracking

An example. The move-tracking-2 branch was caught up to trunk_at_1665165
in r1665166.

svn diff ^/subversion/trunk_at_1665165
^/subversion/branches/move-tracking-2_at_1665166
[...]
Property changes on: .
___________________________________________________________________
Modified: svn:mergeinfo
## -0,1 +0,1 ##
Reverse-merged /subversion/branches/move-tracking-2:r1607334
Merged /subversion/trunk:r1606692-1663280,1663281-1665165*

This appears to say, "In comparison to trunk, one move-tracking-2
branch commit was reverse-merged on move-tracking-2." That's wrong:
rather, that revision was merged from the branch to trunk in the past
and is currently common to both. And, "Lots of stuff from trunk was
merged to the branch." That's wrong too. There *is* a difference in
the mergeinfo properties that corresponds to what is printed here, but
that difference in properties does not represent a difference in the
versioned states of the branches. It is an implementation detail.

We should have a diff that does not show mergeinfo implementation
differences. Why? Not merely because seeing the mergeinfo is "a
distraction" or "inconvenient", but because the "pure" difference
between versioned states is a semantically useful concept in its own
right.

OK, we already knew about the leaky abstraction of mergeinfo. What else?

Copy-from. This is a much deeper issue. Let's start by noting that our
diff commands present different results depending on copy-from
information, modified by flags such as '--show-copies-as-adds' and
'--diff-copy-from'. There are many inconsistencies in these
behaviours, some of which have been discussed and filed as bugs
including:

A very long email thread linked from #4455 discusses some of these
inconsistencies, but fails to reach an understanding of the real
problem and solution.

Some of our differencing APIs don't present copy-from information on
adds (svn_ra_do_diff3). Others can do (svn_ra_do_switch3), but what
exactly does it mean? Both the left and right 'sides' of the diff may
be trees that were copied from somewhere and modified with sub-tree
copies. Does the copy-from of 'D/foo' on the right-hand side, say,
mean the source of the most recent copy where this node was the
copy-root; or the most recent copy that included this node even if the
copy root was some parent directory; or only describe it as a copy if
the copy target revision was revision DIFF_RIGHT itself; or what? So
far we were probably thinking of a diff where the right-hand side is a
newer revision of the same line of history as the left, but what if
the right-hand side is in fact the older revision, do we apply
different rules? And if both sides are at the same revision (say) but
in different branches, then what?

The underlying problem, I suggest, is that we haven't clearly defined
how copying fits into the model of 'versioned state' and 'difference'.

That's something I've been thinking on, and I have some ideas which I'll share.

- Julian