Re: Move Tracking in the Update Editor
From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Mon, 2 Sep 2013 14:06:19 +0100 (BST)
Philip Martin wrote:
This is good as far as local mods are concerned. The decision of how to combine A and B, or raise a conflict, is left to the WC client code as it should be, and sufficient information is provided about the nature of the changes.
Functionally, this would work as you've shown. Regarding updating the base tree, however, note that the plain "add" in Ev1 is a node-by-node add of the whole subtree, which could be arbitrarily large. For performance, we must not use a plain add, as a move should be designed to execute in O(1) time.[1]
So, one of the base trees (at A or B) should be deleted, while the other should be moved over to 'C' and updated. The edit producer must therefore make a decision between A and B, and send either
[move-info...]
or
[move-info...]
(The ideal choice of which base-tree to move would be the one with the smaller delta to C_at_30, but that is hard to determine efficiently. The one with the closer revision number, in this case B, would often be a good choice. However, each of A and B could be a mixed-rev subtree. A good choosing algorithm is far from obvious. That this problem exists at all might be a clue that something deeper is wrong.)
But notice how we're really trying to send a combination of two different messages:
* The incoming logical change, that must be merged with any local mods, is:
- mv(A,C) && modify-tree(diff(A10:C30))
* To produce the new base tree, one reasonably efficient way is:
- rm(A)
What we've been striving for above is a kind of hybrid of those two messages.
But let's try to take a step back and look at how we got into this surprisingly ugly case in the first place. The real problem stems from an "impedance mismatch" between the path-centric semantics of the existing WC and the line-of-history semantics that "moves" introduces.
Why do we start off with the expectation that this update (A_at_10 and B_at_20 -> C_at_30) should resolve smoothly into C_at_30? Because, in our old-fashioned thinking, this was a "normal" mixed-revision WC, and any "normal" WC should update smoothly. But what if we instead started off with this WC (and the same repository):
old_B->A_at_10
Then we would say that 'old_B' is 'switched' and we would not expect an update to bring this WC to contain just a child 'C'. We might expect the child named 'old_B' to be deleted (if we're still clinging to the old copy-and-delete ideas about how a move should be handled), or in a move-aware world we might expect 'old_B' to remain present but end up pointing at '^/C_at_30'.
So, back to the original example, I'm thinking that the child 'A_at_10' should be treated as a switched child. In a move-aware world, that child should not be found in the WC at all at the same time as 'B_at_20' unless somebody did a deliberate kind of move-defeating update.
But the WC doesn't know 'A_at_10' and 'B_at_20' are the same repository node. The WC doesn't know about node-ids or node-history relationships. This is the problem. Maybe the WC needs to know.
If the WC doesn't know about node-id relationships, then it cannot prevent the following silly situations from arising:
(1) Start from:
A_at_10
Then move A to B/A, and try to commit. The commit will fail because
A_at_10
Then modify A, and modify B, and try to commit. The commit will fail
I think the answer is that in order to make a properly move-aware system, the WC needs to know about node-id relationships [2].
It may be advantageous to come up with an implementation plan whereby move-awareness is introduced first to the repository side, with just enough support in the WC to make typical move scenarios work much better than they do today, and then as a second phase extend the WC to know about node-id relationships and complete the WC-side awareness (so that scenarios like the above commit-time failures are instead detected within the WC, and whatever else goes along with that).
- Julian
[1] The current WC DB schema requires O(nodes in subtree) time for a
[2] It could either know the actual node-copy-ids, or just know which pairs of nodes have the same node-copy-id; either would suffice.
|
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.