# Re: Symmetry for branching, move tracking and merging

From: Branko Čibej <brane_at_wandisco.com>
Date: Sat, 14 Mar 2015 05:20:58 +0100

On 10.03.2015 18:00, Julian Foad wrote:
> Branko Čibej wrote:
>> On 06.03.2015 12:13, Julian Foad wrote:
> [...]
>>> In order to build a sane merging system, we expect certain symmetries in
>>> the data model.
> [...]
>>> That generalizes to:
>>>
>>> * uniformity of the difference from branch1_at_r1 to branch2_at_r2
>>> for any values of: branch1, r1, branch2, r2
>>> where branch1 and branch2 are 'related' (formally: in the same branch family)
>>>
>>> * diffs obey some (more or less obvious) arithmetic rules such as:
>>> diff(A,B) (+) diff(B,C) == diff(A,C)
>>> diff(A,B) (+) diff(B,A) == nil
>>> and so on
>>>
>>> Not just roughly but precisely, testably so.
>> http://darcs.net/Theory/GaneshPatchAlgebra
>>
>> You've heard of darcs, I take it. :)
> Certainly! However, Darcs patch algebra is mostly concerned with how
> patches can be re-ordered (commuted, rebased) and additional semantic
> patch types (renaming all occurrences of an identifier) and how the
> system can take advantage of those properties. That's a step beyond
>
> What I am saying is so fundamental that Darcs and probably all DVCS's
> take it as an axiom. It is that we need to define a "versioned state"
> (of a branch), which is a "context" in Darcs terminology, and
> "difference" between versioned states, in a way that is independent of
> which branch(es) and revision(s) we're looking at.
>
> So I was sloppy in saying "some" arithmetic rules. Let me try being a
> bit more precise, but still not complete.
[...]
>
> Here's a clue. There is no Subversion command to output a pure state
> of a branch,
> in its entirety and free of other information. ('svn export' + 'svn
> proplist -v' comes close, but not close enough as we'll see below.)
> Nor to output a pure difference. (Adding tree changes to 'svn diff'
> would be a start.) Nor to input a pure state or difference. Nor are
> there even APIs that exactly serve these purposes.

Let's not confuse the command-line client with Subversion-the-platform.
The client takes a lot of liberties to produce output that's useful for
the user, even though it's not theoretically pure.

The pure-difference is called 'svn_repos_replay'. The pure state ...
isn't that just 'svn checkout'? And of course there's 'svnadmin dump';
the former with --incremental --deltas, the latter without; and our dump
format is pretty much complete at this point.

> To really see what's missing, we need to focus on the difficult parts
> which include:
>
> * mergeinfo

I'm not at all sure that mergeinfo is even part of the versioned state,
never mind current implementation. If it is, we're certainly not
modelling it correctly. But we know that.

> * copies (what to do with 'copy-from' information)
>
> * move tracking
[...]
> The underlying problem, I suggest, is that we haven't clearly defined
> how copying fits into the model of 'versioned state' and 'difference'.

Well, there's an alternative interpretation ... that your model of
'versioned state' and 'difference' is incomplete. :) I can't think of a
different way of recording copies (as opposed to moves!) than saying,
"this is a new node which was created from [predecessor noderev_at_revision])".

[...]
> Where this gets really interesting is in the definition of a 'move'. It's easy to come up with models for move tracking that fail to meet this criterion. The model I am working on at the moment does, I believe, meet this criterion, and I believe that's really important.

I'd like to see this elaborated a bit more. How, exactly, is it easy to
come up with move-tracking models that fail your criterion? I would
expect the opposite, that it's quite hard, unless you define 'move' to
have strange side effects.

-- Brane