Re: Design query: Ev2 and multiple moves

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 8 Nov 2011 11:22:16 +0000 (GMT)

Hi Bill.

Let me see if I understand.

You describe a procedure for converting a set of parallel renames into an equivalent incremental sequence of renames. That is, the input is a list of N independent renames, where each "rename[i]" (for 0 <= i < N) describes a move from a path in the initial tree state "S0" to a path in the final tree state "SN". The output is a sequence of renames, where each "rename[i]" in turn describes a move from a path in the preceding tree state S[i] to the next intermediate tree state S[i+1], the final state S[N] being equal to the desired state "SN".

That transformation can be useful in itself, for example it might be useful for managing renames within the working copy library code.

However, the really interesting property of this algorithm is the meaning -- the implied ordering -- that this particular algorithm defines for renames that happen in parallel, based on their depth in the tree. Presumably, this meaning is believed to be what a user would most reasonably expect in most cases. It makes sense to me.

And so this would seem to provide a good definition for how we should combine renames when merging changes from one branch to another[1].

Now, some speculation on how this could work.

When we look at renames in the history of a single branch in Subversion, it is easy to see that we already have a partially sequential ordering: each revision makes changes based on the state left by the previous revision. At the same time, within a single revision we currently don't record any sequential information about renames, and I'm not sure if we ever will. If we infer renames within revision R from the current copy-from information then we have a list of parallel renames each based on the previous revision (R-1) [2].

Assuming we know how to decompose a sequence of renames into a parallel set [3], we can do things like convert the renames on each branch into a parallel set, merge the two parallel sets, and then convert back to a sequence. Or, depending on what merge we're doing, we may want to track the renames backward through one branch to the common ancestor and then forward on the other branch, which sounds like it should be just as easy.

We'll have to see how to do conflict resolution while merging the two sets.

I think we'll also have to account for non-rename changes within the same algorithm. I think that's necessary in order to handle sequences such as:

  rename trunk tmp
  mkdir trunk
  rename tmp trunk/subversion

As a set of parallel renames and other operations, this is:

[NIL] -> trunk
trunk -> trunk/subversion

Does that sound right that we'd want to extend the algorithm to include adds (creation of new nodes) and deletes?

- Julian

[1] At least when the branches share a common ancestor; maybe even when they don't but I haven't thought about that.

[2] Or, for added complexity, based on a revision earlier than (R-1).

[3] It seems obvious -- just compare the initial and final states with regard to object identities -- but I don't know whether it's really that simple.

Bill Tutt wrote:
[...]
> Storage: You just store the renames as source/destination
> pairs as appropriate for your system
> Application of a series of renames:
>
> For Each Source Part of the Rename (ordered by deepest
> first (DFS I think)):
> Move the source AND its contents to <tempID>/source
>
> For Each Destination Part of the Rename (ordered by
> shallowest first
> (BFS I think)):
> Move the item and its contents from <tempID>/source
> -> the destination of the rename.
>
> This has the advantage of never caring how many items are
> involved in a dependent rename set or what order to apply them
> in because all dependencies are broken.
> The downside is that the process is expensive because it
> costs 2*(# of renames)*(# descendants of source of rename). The
> descendants matter if your system has an O(N) property for
> dealing with renames.
>
> This allows for weird rename sequences like:
>
> Assume you have a directory hierarchy:
> A/B/C/D
>
> Each of these has a file in it named: <dirname>.c
> i.e.:
> A/a.c
> A/B/b.c
> A/B/C/c.c
> A/B/C/D/d.c
>
> Additionally, we'll add some extra files in D:
> A/B/C/D/e.c and A/B/C/D/f.c
>
> Now the list of renames can begin: (Note: relative paths are listed
> here for the files to indicate that their containing parent didn't
> change)
> A/B/C -> A/C
> A/B -> A/C/B
> D/d.c -> D/e.c
> D/e.c -> D/f.c
> D/f.c -> D/g.c
> A/B/C/D -> A/C/E
>
> Now, quick, before I give the answer, what does the
> destination tree look like?
>
> Here we go:
>
> A/a.c
> A/C/c.c
> A/C/B/b.c
> A/C/E/e.c
> A/C/E/f.c
> A/C/E/g.c
>
> These renames show some of the nastier classes of renames:
> Tree Hierarchy changing ones, and simple dependent file renames
> where its container name changed.
[...]
> Additional fun wrinkles to watch out for:
> Partial undoing of renames. (i.e. reverting a pending rename)
> Partial committing of renames. (committing A/C/E/e.c but not
> A/C/E/f.c) (If you allow this you undo non-committing renames on the
> set of renames before applying the submitted changes.)
> Partial merging of renames.
> Calculating the merged rename path where the destination
> tree has had hierarchy rearranging renames of its own.
> Resolving rename merge conflicts in such a way as to make
> sense to the user when you have an arbitrary tree structure you're
> merging into.
>
> A rename that is being merged consists of a source tree rename and a
> destination tree rename that you're trying to construct.
> The important parts of this operation are the source tree rename
> destination, the source tree rename destination parent (strdp), the
> current location of the source item in the destination tree and the
> current location of the strdp in the destination tree. (strdp')
> The suggested destination tree destination of the merged rename will
> be: strdp'\<the item's rename destination name>
>
> e.g.: You have this tree:
> A/B/C/c.c
>
> branch A -> A'
> rename A'/B -> A'/D
> rename A/B/C/c.c -> A/B/d.c
> merge A -> A'
> This merge needs to merge the c.c rename into A'.
> In this case strdp is A/B. strdp's name in branch A' is
> A'/D.
> The suggested merged rename destination is: A'/D/d.c
> Note: Just because I didn't show the two pass steps here
> don't mean it doesn't need to be done in the general case. :)
>
[...]
Received on 2011-11-08 12:22:54 CET

This message: [ Message body ]
Next message: Philip Martin: "Re: svn commit: r1199222 - in /subversion/branches/showing-merge-info/subversion: include/private/svn_client_private.h libsvn_client/mergeinfo.c libsvn_client/ra.c svn/mergeinfo-cmd.c"
Previous message: Stefan Sperling: "Re: Cross-device link problem with 1.7.x"
In reply to: Bill Tutt: "Re: Design query: Ev2 and multiple moves"
Next in thread: Bill Tutt: "Re: Design query: Ev2 and multiple moves"
Reply: Bill Tutt: "Re: Design query: Ev2 and multiple moves"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]