Re: Moves in FSFS

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Thu, 19 Sep 2013 16:46:10 +0200

Hi all,

I spend most of this week thinking about moves and I
changed my mind as further evidence came up.

What I fully agree with is:

http://wiki.apache.org/subversion/MoveDev/MoveDev#Move_Semantics

Then, I found an equivalent expression for the nodeLineID
concept central to the move semantics:

http://svn.haxx.se/dev/archive-2013-09/0112.shtml

and showed there as well as here:

http://svn.haxx.se/dev/archive-2013-09/0128.shtml

that the current and proposed usage of nodeID, copyID is
not equivalent, i.e. cannot be used immediately as a bases
for path matching.

Finding Moves
--------------

This brought me to realization that having move semantics
(and thus identity) defined is necessary and useful but the
implementation will most likely be a history scan. Following
internal node chains to identify common copy roots is very
time consuming and I/O inefficient.

History scans are quick in f7 (10k+ revs/s) and might be
even faster with 1.10 (100k+ revs/s) but you want to do
only one or at most a handful of them for an update editor
run. One way to do it has been proposed here:

http://svn.haxx.se/dev/archive-2013-09/0136.shtml

Another way would be to tell the server the full range (oldest
revision) as part of the reporter initialization. The results of
the scan might then even be applicable to mixed rev working
copies.

Storing Moves
-------------

From the FS POV, moves behave like simple ADD/DEL pairs
and should, therefore, be stored as such. The differences
are:

* deletes due to moves must not be executed if the
  respective path already got replaced
* there must be only one move per revision for any
  source as well as target path (to be checked upon
  transaction commit)
* instead of "add"/"replace", say "move"/"movereplace"
  in the changes list to signify the user's intend

But other than that, existing code should DTRT even in
the presence of moves and backward compat APIs can simply
report moves as ADD/DEL pairs without becoming inconsistent
with the actual data.

Later, revised FS implementations may then impose additional
ID <-> operation dependencies internally.

History Traversal and Moves
---------------------------

This might be opt-out but most users would want to follow
renames / moves even if they don't follow copies. Since
both have the same internal representation, we need to look
into the changed paths list to determine whether the change
has been an actual ADD or a MOVE.

That is virtually for free as we usually to report the
changed paths for all relevant revisions anyway.

Auto-move Option
----------------

It is safe to automatically treat all unique DEL(pathA)
+ ADD(pathA_at_R-1 -> pathB) pairs in revision R as moves,
iff the user did not intend to treat them as separate
node lines. This will not modify recorded history but
only reduce the number of tree conflicts a user may see.

Therefore, log and reporter API should be rev'ed to take
an extra "auto_move" boolean parameter. If set, the
server will attempt to report moves instead of DEL/ADD
pairs with no guarantee that all or even any of the pairs
will actually be detected (future backends may or may not
support this option).

-- Stefan^2.

On Wed, Sep 11, 2013 at 5:21 PM, Julian Foad <julianfoad_at_btopenworld.com>wrote:

> While discussions continue about the "editor" and the WC side of move
> tracking, I'd like to make some progress on the Repository side.
>
> The Wiki page
>
> http://wiki.apache.org/subversion/MoveDev/MoveDev#Move_Semantics
>
> declares the semantic model and
>
> http://wiki.apache.org/subversion/MoveDev/MovesInFSFS
>
> attempts to specify the changes needed. I think the model for move
> semantics in the repository as presented there is pretty solid --
> precise and complete and having a nice set of properties.
>
> Can anyone review this? If anyone can take it forward in any way that
> would be a great help, otherwise I'll tackle the implementation
> myself.
>
> One issue that may be harder than it sounds at first is the concept of
> 'node-line-id' rather than (node-id, copy-id) as the basis of the
> definition. The point is that when we copy (ordinary copy, not move)
> a directory, we lazy-copy the children, which means each child keeps
> its old (node-id, copy-id) unless and until it is modified. That's
> great for achieving the O(1) copy, but for move-tracking purposes each
> child needs a unique "node-line-id" so its life-line can be uniquely
> traced forward and back between this revision and a later revision by
> which time it may have been modified and thus assigned a new copy-id.
>
> Clearly it would defeat the O(1) cost if we were to construct a
> node-line-id explicitly for every node in the tree at copy time. Can
> we instead define node-line-id such that we can compute it as needed,
> from either an unmodified lazy-copied child or after such a child has
> been modified, and get the same answer? Or perhaps re-state the
> problem to avoid this need?
>
> - Julian
>
Received on 2013-09-19 16:46:54 CEST

This message: [ Message body ]
Next message: Branko ÄŒibej: "Re: Moves in FSFS"
Previous message: Julian Foad: "Re: Move Tracking in the Update Editor"
In reply to: Julian Foad: "Moves in FSFS"
Next in thread: Branko ÄŒibej: "Re: Moves in FSFS"
Reply: Branko ÄŒibej: "Re: Moves in FSFS"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]