Re: Design query: Ev2 and multiple moves

From: Bill Tutt <bill_at_tutts.org>
Date: Thu, 10 Nov 2011 10:18:20 -0500

On Tue, Nov 8, 2011 at 6:22 AM, Julian Foad <julianfoad_at_btopenworld.com> wrote:
> Hi Bill.
>
> Let me see if I understand.
>
> You describe a procedure for converting a set of parallel renames into an equivalent incremental sequence of renames.  That is, the input is a list of N independent renames, where each "rename[i]" (for 0 <= i < N) describes a move from a path in the initial tree state "S0" to a path in the final tree state "SN".  The output is a sequence of renames, where each "rename[i]" in turn describes a move from a path in the preceding tree state S[i] to the next intermediate tree state S[i+1], the final state S[N] being equal to the desired state "SN".
>
> That transformation can be useful in itself, for example it might be useful for managing renames within the working copy library code.
>

The approach helps, but in certain scenarios there are other ways you
can achieve your goal than using the proposed algorithm. It just so
happens the algorithm can be used anywhere you deal with renames if
you can't determine a more optimal solution based on the command
you're currently executing.

Also you don't need a fixed list of parallel renames to start with, it
was just easier to explain the approach with that as the starting
point. Quite frequently the list of renames needs to be calculated.
Example:

Let's take the example from my last email:
> e.g.: You have this tree:
> A/B/C/c.c
>
> branch A -> A'
> rename A'/B -> A'/D
> rename A/B/C/c.c -> A/B/d.c
> merge A -> A'

The merge operation works something like:
* Calculate merge history range missing in A' from A (thats a big
orthogonal problem, lets ignore it)
* Determine what if anything changed for each item in A in that revision range
* For all detected renames construct a set of parallel renames based
on the entire path of A/B/C/c.c and A/B/d.c on the edges of the
revisions in A we're merging into A'.
* If helpful, determine who owns the conceptual rename that occurred
and not every renamed item in the subtree that only changed tree paths
because a parent directory's name changed. (this might not be
necessary in SVN)
* Apply the approach on the set of parallel renames
* Determine what conflicts to issue and issue them.

> However, the really interesting property of this algorithm is the meaning -- the implied ordering -- that this particular algorithm defines for renames that happen in parallel, based on their depth in the tree.  Presumably, this meaning is believed to be what a user would most reasonably expect in most cases.  It makes sense to me.
>
> And so this would seem to provide a good definition for how we should combine renames when merging changes from one branch to another[1].
>

Yep, that's the idea.

> Now, some speculation on how this could work.
>
> When we look at renames in the history of a single branch in Subversion, it is easy to see that we already have a partially sequential ordering: each revision makes changes based on the state left by the previous revision.  At the same time, within a single revision we currently don't record any sequential information about renames, and I'm not sure if we ever will.  If we infer renames within revision R from the current copy-from information then we have a list of parallel renames each based on the previous revision (R-1) [2].
>

Sequential info about renames isn't necessary for the approach to
work. The necessary component is being able to compute renames
specifically. Applying this approach to Subversion is interesting
because rename is currently copy + delete, IIRC. Allowing a first
class rename in the repository makes all of the gory rename details a
little easier to deal with. However, doing so in Subversion is a
little tricky since the item/node names are owned by the directory in
the FS schema (again, IIRC). If the file/directory owned its own name
mapping this approach to SVN might be a bit easier.

> Assuming we know how to decompose a sequence of renames into a parallel set [3], we can do things like convert the renames on each branch into a parallel set, merge the two parallel sets, and then convert back to a sequence.  Or, depending on what merge we're doing,  we may want to track the renames backward through one branch to the common ancestor and then forward on the other branch, which sounds like it should be just as easy.
>
> We'll have to see how to do conflict resolution while merging the two sets.
>
> I think we'll also have to account for non-rename changes within the same algorithm.  I think that's necessary in order to handle sequences such as:
>
>  rename trunk tmp
>  mkdir trunk
>  rename tmp trunk/subversion
>
> As a set of parallel renames and other operations, this is:
>
>  [NIL]     ->  trunk
>  trunk     ->  trunk/subversion
>
> Does that sound right that we'd want to extend the algorithm to include adds (creation of new nodes) and deletes?
>

Your example is interesting mostly because doing the steps above would
mean that the root of the user's concept of a branch "trunk" isn't
related (via merge history) directly to the roots of its other
branches. If you tried your example with TFS you would need to
recreate the merge history relationship between trunk' (the one from
tmp) and your other branches before a real merge would be allowed
again. I don't think this case (as presented) is worth worrying about.
If you'd like to support this for Subversion, then you should think
long and hard about how your merge history and renames intersect in
this fashion. What would the affect be of merging your example into
one of the previously created branches?

If you did something like:

rename A'/B A'/tmp
mkdir A'/B
rename A'/tmp A'/B/Q
commit

The rename approach would handle it just fine during a merge from A' back to A.
A/B -> A/B/Q

The new directory is simply going to get copied (branched) from A' into A.

> - Julian
>
>
> [1] At least when the branches share a common ancestor; maybe even when they don't but I haven't thought about that.

If the branches aren't related by a common ancestor renames loose
their meaning because strdp' doesn't exist in the destination tree of
the merge.
I didn't even touch on the impact of what happens when strdp' had been
deleted in the destination tree. (let alone never existed because the
branches don't share a relative (let alone common ancestor).
TFS made this a little bit easier because it had the concept of old
contents of a directory still being visible after they'd been deleted.
(They gained an extra bit of data called a DeletionId, and you could
easily undelete them if you chose.)

>
> [2] Or, for added complexity, based on a revision earlier than (R-1).
>
> [3] It seems obvious -- just compare the initial and final states with regard to object identities -- but I don't know whether it's really that simple.

It really is. (for computing the parallel rename set) The
complications arise from merge history, deletes/undeletes (if you
support them), renames of the root of the branch, and presenting a
sensible UI to resolving the rename conflicts.

Bill