[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Move Tracking in the Update Editor

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Mon, 2 Sep 2013 14:06:19 +0100 (BST)

Philip Martin wrote:
>> On 30.08.2013 17:18, Julian Foad wrote:
>>> The client *does* need to know that B is in fact being moved to C, so
>>> that it can offer to transfer my local modifications from B to C.
>
> Allowing multiple moves to the same destination doesn't really fit with
> Ev2 but I can imagine an enhanced Ev1 editor drive that does
>
>    move away A
>    move to C (original A)
>    del A
>    move away B
>    move to C (original B)
>    del B
>    add C
>
> The deletes lead to tree-conflicts on A and B due to local mods.  The
> add creates a pristine C with no local mods.  The user resolves the
> tree-conflicts post-update and gets to choose which local mods are
> merged to C, possibly both but one before the other, which may in turn
> raise text/prop/tree-conflicts on C.

This is good as far as local mods are concerned.  The decision of how to combine A and B, or raise a conflict, is left to the WC client code as it should be, and sufficient information is provided about the nature of the changes.

Functionally, this would work as you've shown.  Regarding updating the base tree, however, note that the plain "add" in Ev1 is a node-by-node add of the whole subtree, which could be arbitrarily large.  For performance, we must not use a plain add, as a move should be designed to execute in O(1) time.[1]

So, one of the base trees (at A or B) should be deleted, while the other should be moved over to 'C' and updated.  The edit producer must therefore make a decision between A and B, and send either

  [move-info...]
  del A   // but WC shall not delete the base tree
  del B
  add C (copy-from A_at_10)  // WC shall move the base tree from A
  modify... C/...  // diff A_at_10:C_at_30

or

  [move-info...]
  del A
  del B   // but WC shall not delete the base tree
  add C (copy-from B_at_20)  // WC shall move the base tree from B
  modify... C/...  // diff B_at_20:C_at_30

(The ideal choice of which base-tree to move would be the one with the smaller delta to C_at_30, but that is hard to determine efficiently.  The one with the closer revision number, in this case B, would often be a good choice.  However, each of A and B could be a mixed-rev subtree.  A good choosing algorithm is far from obvious.  That this problem exists at all might be a clue that something deeper is wrong.)

But notice how we're really trying to send a combination of two different messages:

  * The incoming logical change, that must be merged with any local mods, is:

    - mv(A,C) && modify-tree(diff(A10:C30))
    - mv(B,C) && modify-tree(diff(B20:C30))

  * To produce the new base tree, one reasonably efficient way is:

    - rm(A)
    - mv(B,C)
    - modify-tree(C, diff(B20:C30))

What we've been striving for above is a kind of hybrid of those two messages.

But let's try to take a step back and look at how we got into this surprisingly ugly case in the first place.  The real problem stems from an "impedance mismatch" between the path-centric semantics of the existing WC and the line-of-history semantics that "moves" introduces.

Why do we start off with the expectation that this update (A_at_10 and B_at_20 -> C_at_30) should resolve smoothly into C_at_30?  Because, in our old-fashioned thinking, this was a "normal" mixed-revision WC, and any "normal" WC should update smoothly.  But what if we instead started off with this WC (and the same repository):

  old_B->A_at_10
  B_at_20

Then we would say that 'old_B' is 'switched' and we would not expect an update to bring this WC to contain just a child 'C'.  We might expect the child named 'old_B' to be deleted (if we're still clinging to the old copy-and-delete ideas about how a move should be handled), or in a move-aware world we might expect 'old_B' to remain present but end up pointing at '^/C_at_30'.

So, back to the original example, I'm thinking that the child 'A_at_10' should be treated as a switched child.  In a move-aware world, that child should not be found in the WC at all at the same time as 'B_at_20' unless somebody did a deliberate kind of move-defeating update.

But the WC doesn't know 'A_at_10' and 'B_at_20' are the same repository node.  The WC doesn't know about node-ids or node-history relationships.  This is the problem.  Maybe the WC needs to know.

If the WC doesn't know about node-id relationships, then it cannot prevent the following silly situations from arising:

(1) Start from:

  A_at_10
  B_at_20

  Then move A to B/A, and try to commit.  The commit will fail because
you cannot move the same node into a child of itself.  The detection
occurs only at commit time, where the system knows about node ids,
rather than at the time the invalid operation was performed.
(2) Start from:

  A_at_10
  B_at_20

  Then modify A, and modify B, and try to commit.  The commit will fail
because the two changes to the same repository node conflict.  The WC
contains logic to detect this kind of problem before commit in the
specific case where a switched WC node is pointing to the same URL as
another node, but it does not know how to detect when two WC paths
pointing to *different* URLs are in fact pointing to the same repository node.

I think the answer is that in order to make a properly move-aware system, the WC needs to know about node-id relationships [2].

It may be advantageous to come up with an implementation plan whereby move-awareness is introduced first to the repository side, with just enough support in the WC to make typical move scenarios work much better than they do today, and then as a second phase extend the WC to know about node-id relationships and complete the WC-side awareness (so that scenarios like the above commit-time failures are instead detected within the WC, and whatever else goes along with that).

- Julian

[1] The current WC DB schema requires O(nodes in subtree) time for a
move internally, but that's an implementation detail and is often not
the dominant factor.

[2] It could either know the actual node-copy-ids, or just know which pairs of nodes have the same node-copy-id; either would suffice.
Received on 2013-09-02 15:07:17 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.