Move Tracking in the Update Editor

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Fri, 30 Aug 2013 12:03:35 +0100

I'm working on how the update editor can handle moves. It's more complex
than the commit editor, because there can be multiple instances of the
"same" repository node in the WC, so moves are not necessarily unique.

This is a copy of my notes in progress. I could use some suggestions or
thoughts if you have any?

[ASCII version first; scroll down for HTML version.]

=============================================

= Summary =
A WC can contain multiple instances of the same repository node, by
mixed-revision and/or switched paths. During an update, multiple instances
can move to the same target path, and one instance can move to multiple
target paths. We can't reasonably avoid WCs getting into such a state, nor
forbid updating such a WC.

An editor that can perform one move per node (that is, per node-copy-id) is
suitable for editing a repository state, and can thus be used as the commit
editor. The existing update code drives an edit over WC paths, not over
repository nodes or URLs. An editor that can perform only one move per node
cannot be used as the update editor.

Therefore, the update editor must somehow handle multiple source and
destination paths for the same move.

= Details =
== WC has Multiple Instances per Node ==
A WC can contain multiple instances of the same repository node.

* A switched path points to the same or a different revision of any node
** WC: (A_at_10, X=^/A_at_10)
** WC: (A_at_10, X=^/A_at_11); repo: (^/A_at_10 and ^/A_at_11 are the same node-rev)
* A non-switched path points to a different revision of a moved node
** WC: (A_at_10, B_at_20); repo: (mv ^/A_at_10 ^/B_at_20)

The WC does not (currently) know about node-copy-ids, and so does not know
that it has multiple instances of the same node, except in the simple case
where the URL_at_REV is the same.

== Multi-Source and Multi-Target Moves ==
During an update, multiple instances of one repository node can move to the
same target path, and one instance can move to multiple target paths.

=== Multiple Sources ===
* WC: (A_at_10, B_at_20); repo: (mv A_at_10 â†’ B_at_20 â†’ C_at_30)
* Update to r30; A moves to C, and also B moves to C.

| |
+-- A mv--\ |
| \ |
+-- B mv--\ \ |
\-\--> +-- C

If multiple sources have the same tree shape (switches, depth, etc.) and no
local modifications, then it makes sense for the WC to simply accept the
single destination. If there are local modifications to multiple source
instances, then the client might want to merge them or raise a conflict.

=== One Move, One Non-Move ===
This case is similar to Multiple Sources, with one important difference. If
we assume an editor in which each move is labeled by a move-id, the
consumer cannot recognize such a conflict just by examining the move-ids.

* WC: (A_at_10, B_at_20); repo: (mv A_at_10 â†’ B_at_20)
* Update to r30; A moves to B, and also the existing B is updated.

| |
+-- A mv---\ |
| \ |
+-- B mod--> \--> +-- B

=== Multiple Targets ===
* WC: (A_at_10, X_at_10, Y=^/X_at_10); repo: (mv A_at_10 â†’ X/A_at_20)
* Update to r20; A moves to X/A and also to Y/A.

| |
+-- A mv--\ |
| \ \ |
+-- X \ \ +-- X
| \ \--> | +-- A
| \ |
+-- Y \ +-- Y
\--> +-- A

With multiple targets, there is no need to prevent multiple instances of
the destination node from being created. However, if there are local
modifications, it could be undesirable to end up with the same
modifications in multiple places, so the client might want to warn the user
or allow the user to choose what happens to the modifications.

=== Many-to-Many Move ===
Many-to-many move, combining multiple sources with multiple target paths:

* WC: r10 (A, B=^/A, X, Y=^/X); repo: (mv A_at_10 â†’ X/B_at_20)
* Update to r20; A and B both move to X/C and to Y/C.

With a many-to-many move, there is the possibility that the sources and
destinations can be logically paired according to their pathwise nearness.
Example, starting from WC (trunk1â†’^/trunk_at_10, trunk2â†’^/trunk_at_20) and repo
(mv trunk/A_at_10 â†’ trunk/B_at_20):

| |
+-- trunk1 +-- trunk1
| | | |
| +-- A mv--\ | |
| \-> | +-- B
| |
+-- trunk2 +-- trunk2
     | |
     +-- A mv--\ |
                  \-> +-- B

This pairing could be implemented by the edit driver, in which case it
should describe each such move with its own id, or by the consumer on
receiving a set of many-to-many moves.

### What are the rules for this nearness pairing?

== Avoidance ==
We can't easily avoid WCs getting into such a state. To avoid it, the WC
would probably need to know node-ids and have substantial changes in the
allowed patterns of usage.

When a WC has multiple instances of the same repository node, we can't
reasonably forbid updating it.

== Editor ==
Either the update editor must cope with multiple source and destination
paths for the same move, or the client must request several simple edits,
each with no multiple instances. Options include:

* Traversal over WC paths using non-unique mv-away and mv-here
* Traversal over URLs or repository nodes
** multiple edit operations per node, one for each WC path
* Client requests multiple edits, with no multiple instances in a single
edit

=== Non-Unique Moves ===
Traversal over the WC paths, using non-unique mv-away and mv-here.

* 1 op. per path (excluding replacements)
* mv-away and mv-here not uniquely paired by their id

Problems:

* The consumer (client/WC) may want to know whether a given move is unique
before executing it, so that it can choose to warn or raise a conflict (for
example).
* Each move-away is (logically) accompanied by its own edit. For example,
with WC (A_at_10, B_at_20), repo (A_at_10 â†’ B_at_20 â†’ C_at_30), one edit applies to A
(r10:30) and another edit applies to B (r20:30).

| |
+-- A mv-----\ |
| +r10:30 \ |
| \ |
+-- B mv-----\ \ |
+r20:30 \ \ |
\--\--> +-- C

But the WC doesn't necessarily need to receive instructions to edit every
move-away instance. If two instances have the same tree shape (switches,
depth, etc.) then it only needs to move and edit one of them and can
discard the others (after preserving any local mods). Thus:

The edit operation for path A cannot be simply â€œdeleteâ€. It needs to
indicate that A is part of the same move that also affects B, because the
client may want to notify the user appropriately and preserve any local
mods in A (perhaps merging them into C).

So, is something like this the way forward?

=== Traversal over URLs or repository nodes ===
Would it help if the edit traversed URLs or nodes instead of WC paths? If
such an editor would visitit each URL once, it would need to be able to
send multiple edit operations for the same URL, one operation per instance
of that node. Or something similar for nodes instead of URLs.

This seems to have no advantages over the approach of describing non-unique
moves.

=== Multiple, Simple Edits ===
Another option was briefly considered. Can the client request multiple
edits, with no multiple instances in a single edit?

It would need to know node ids. Either the client would know the node ids
in the WC and make multiple reports, requesting one edit each, or the
reporter (report handler) would know them and have the ability to issue
multiple edits to the client. Either way, how would it decide how to
partition the changes? And still the client would want to be able to detect
conflicts, and this would seem to be more difficult to achieve if the
conflicting changes are in separate edits.

This seems too complex, and too much of a departure from the existing
system.

= Conclusion =
(No conclusion yet.)

= Appendix: Notes on Semantics =
Desirable semantics include:

* Move is a local operation. For example, we can make pairings in this
many-to-many move scenario:
** WC: (trunk1=^/trunk, trunk2=^/trunk)
** Repo: (mv trunk/foo to trunk/bar)
** Update: mv trunk1/foo â†’ trunk1/bar; mv trunk2/foo â†’ trunk2/bar.

Current WC semantics include:

* A node's name and existence is regarded as a property of that node,
rather than of its parent directory. An update of the path 'A' can add or
delete the node at 'A' in the base tree without having to update its parent.

=============================================
Summary

A WC can contain multiple instances of the same repository node, by
mixed-revision and/or switched paths. During an update, multiple instances
can move to the same target path, and one instance can move to multiple
target paths. We can't reasonably avoid WCs getting into such a state, nor
forbid updating such a WC.

Therefore, the update editor must somehow handle multiple source and
destination paths for the same move.
Details WC has Multiple Instances per Node

A WC can contain multiple instances of the same repository node.

A switched path points to the same or a different revision of any node
-

WC: (A_at_10, X=^/A_at_10)
-

WC: (A_at_10, X=^/A_at_11); repo: (^/A_at_10 and ^/A_at_11 are the same node-rev)
-

A non-switched path points to a different revision of a moved node
-

WC: (A_at_10, B_at_20); repo: (mv ^/A_at_10 ^/B_at_20)

The WC does not (currently) know about node-copy-ids, and so does not know
that it has multiple instances of the same node, except in the simple case
where the URL_at_REV is the same.
Multi-Source and Multi-Target Moves

During an update, multiple instances of one repository node can move to the
same target path, and one instance can move to multiple target paths.
Multiple Sources

WC: (A_at_10, B_at_20); repo: (mv A_at_10 â†’ B_at_20 â†’ C_at_30)
-

Update to r30; A moves to C, and also B moves to C.

| |+-- A mv--\ || \ |+-- B
mv--\ \ |
\-\--> +-- C

This case is similar to Multiple Sources, with one important difference. If
we assume an editor in which each move is labeled by a move-id, the
consumer cannot recognize such a conflict just by examining the move-ids.

WC: (A_at_10, B_at_20); repo: (mv A_at_10 â†’ B_at_20)
-

Update to r30; A moves to B, and also the existing B is updated.

| |+-- A mv---\ || \ |+-- B
mod--> \--> +-- B

Multiple Targets

WC: (A_at_10, X_at_10, Y=^/X_at_10); repo: (mv A_at_10 â†’ X/A_at_20)
-

Update to r20; A moves to X/A and also to Y/A.

| |+-- A mv--\ || \ \ |+-- X
\ \ +-- X| \ \--> | +-- A| \ |+--
Y \ +-- Y
\--> +-- A

Many-to-many move, combining multiple sources with multiple target paths:

WC: r10 (A, B=^/A, X, Y=^/X); repo: (mv A_at_10 â†’ X/B_at_20)
-

Update to r20; A and B both move to X/C and to Y/C.

| | +-- trunk1 +-- trunk1| |
         | || +-- A mv--\ | || \-> | +--
B| |+-- trunk2 +-- trunk2
    | |
    +-- A mv--\ |
                 \-> +-- B

This pairing could be implemented by the edit driver, in which case it
should describe each such move with its own id, or by the consumer on
receiving a set of many-to-many moves.

### What are the rules for this nearness pairing?
Avoidance

We can't easily avoid WCs getting into such a state. To avoid it, the WC
would probably need to know node-ids and have substantial changes in the
allowed patterns of usage.

When a WC has multiple instances of the same repository node, we can't
reasonably forbid updating it.
Editor

Either the update editor must cope with multiple source and destination
paths for the same move, or the client must request several simple edits,
each with no multiple instances. Options include:

Traversal over WC paths using non-unique mv-away and mv-here
-

Traversal over URLs or repository nodes
-

multiple edit operations per node, one for each WC path
-

Client requests multiple edits, with no multiple instances in a single
edit

Non-Unique Moves

Traversal over the WC paths, using non-unique mv-away and mv-here.

1 op. per path (excluding replacements)
-

mv-away and mv-here not uniquely paired by their id

Problems:

   The consumer (client/WC) may want to know whether a given move is unique
   before executing it, so that it can choose to warn or raise a conflict (for
   example).
   -

   Each move-away is (logically) accompanied by its own edit. For example,
   with WC (A_at_10, B_at_20), repo (A_at_10 â†’ B_at_20 â†’ C_at_30), one edit applies to A
   (r10:30) and another edit applies to B (r20:30).

| |+-- A mv-----\ || +r10:30 \
      || \ |+-- B mv-----\ \ |
       +r20:30 \ \ |
                \--\--> +-- C

Would it help if the edit traversed URLs or nodes instead of WC paths? If
such an editor would visitit each URL once, it would need to be able to
send multiple edit operations for the same URL, one operation per instance
of that node. Or something similar for nodes instead of URLs.

This seems to have no advantages over the approach of describing non-unique
moves.
Multiple, Simple Edits

Another option was briefly considered. Can the client request multiple
edits, with no multiple instances in a single edit?

This seems too complex, and too much of a departure from the existing
system.
Conclusion

(No conclusion yet.)
Appendix: Notes on Semantics

Desirable semantics include:

   Move is a local operation. For example, we can make pairings in this
   many-to-many move scenario:
   -

WC: (trunk1=^/trunk, trunk2=^/trunk)
-

Repo: (mv trunk/foo to trunk/bar)
-

Update: mv trunk1/foo â†’ trunk1/bar; mv trunk2/foo â†’ trunk2/bar.

Current WC semantics include:

   A node's name and existence is regarded as a property of that node,
   rather than of its parent directory. An update of the path 'A' can add or
   delete the node at 'A' in the base tree without having to update its parent.
Received on 2013-08-30 13:04:35 CEST

This message: [ Message body ]
Next message: Johan Corveleyn: "Re: Improving CHANGES (or at least making it easier to produce)"
Previous message: Bert Huijben: "RE: Improving our release process"
Next in thread: Branko ÄŒibej: "Re: Move Tracking in the Update Editor"
Reply: Branko ÄŒibej: "Re: Move Tracking in the Update Editor"
Reply: Branko ÄŒibej: "Re: Move Tracking in the Update Editor"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]