Re: Moves in FSFS

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Wed, 18 Sep 2013 13:58:51 +0200

On Wed, Sep 18, 2013 at 1:05 PM, Stefan Fuhrmann <
stefan.fuhrmann_at_wandisco.com> wrote:

> On Fri, Sep 13, 2013 at 12:41 PM, Branko Čibej <brane_at_wandisco.com> wrote:
>
>> On 13.09.2013 11:32, Philip Martin wrote:
>> > Branko Čibej <brane_at_wandisco.com> writes:
>> >
>> >> That said, I still do not understand why a different ID would be needed
>> >> before the copy-on-write happens. Is it because the client doesn't have
>> >> the full history available? If that's the case, I suggest we explore
>> >> this on a case-by-case basis, including determining how the initial
>> >> state of a working copy for each case can actually occur.
>> > Is the FS going only able to provide the moves that apply between two
>> > consecutive revisions? Or is it able to provide the combined moves
>> > between arbitrary revisions?
>> >
>> > One way to provide the moves between arbitrary revisions is to iterate
>> > through the intervening revisions accumulating and combining the moves.
>> > That has the potential to be slow.
>> >
>> > Another way to provide the moves between arbitrary revisions is to have
>> > an id to path map per revision which allows the FS to find the path
>> > associated with a given id. However with lazy-copy this map is harder
>> > to implement.
>> >
>> > There is another aspect to the lazy-copy which is when does the new
>> > copy-id get assigned to the lazy children. If we commit
>> >
>> > move A/f A/g
>> >
>> > then move does not allocate a new copy-id and A/f has the same copy-id
>> > as A/g. I think we intend this to be true if the commit combines a move
>> > and a modification to the node. Now commit:
>> >
>> > copy A B
>> >
>> > here B gets a new copy-id and lazily copied children of B still have the
>> > old copy-id. Now what about this commit:
>> >
>> > move B/g B/h
>> >
>> > Does move preserve the copy-id so that B/h is still a reference to A/g?
>>
>> A move through the copied parent has to be interpreted as a write to the
>> subtree, which means that the copy-on-write semantics kick in. The move
>> then breaks down into:
>>
>> make-mutable B/g <-- lazy copy, assigns a new copy-id
>> move B/g B/h <-- move semantics, B/h keeps same copy-id
>>
>> You'll not that "make-mutable" is an implementation detail of the
>> top-down DAG FS model, and it already does what I described above. This
>> is not some new code we'd have to write to implement moves this way; we
>> just have to obey existing rules, i.e., before operating on a path
>> within an FS transaction, the path must first be made mutable. In other
>> words, the FS implementation already works the way I described,
>> regardless of whether the actual operation is "move" or something else.
>>
>> > If the commit was a combined move/modification then B/h would have to
>> > get a new copy-id.
>>
>> It has to get one in any case, as explained above. Operations that
>> affect the state of a node, when performed on a path that contains a
>> copied parent, must produce the same result regardless of whether the
>> filesystem implements lazy copying or not. For the purpose of this
>> model, the node's path is part of its state; although of course the path
>> is not in fact a unique property of the node.
>>
>> But it's not necessary to assign new IDs when the node's state is
>> /read/; i.e., we do not need copy-on-read in order for this model to
>> work. The only consideration is that all operations must work correctly
>> with or without lazy copying. It's OK IMO to let the lazy-copy detail
>> leak outside the FS implementation, as long as we do not make it a
>> required feature.
>>
>
> Hi Brane,
>
> As described above, the ID assignment works for FSFS
> internals but is insufficient to answer the following question:
>
> Are pathA_at_rA and pathB_at_rB points on the same line of history?
>
> The more theoretical background can be found in my other post,
> but the following example should illustrate the problem:
>
> rN: cp A B, add C (A has sub-nodes A/1, A/1/a, A/1/b, A/1/c)
> rN+1: mv B/1 B/3, modify A/1/c, modify B/3/a
> rN+2: mv A/1/c B/2, mv A/1/b C/2
> rN+3: modify B/2, modify C/2
>
> Now, the following statements are true ("related" here means
> "on the same line of history"):
>
> (1) A/1/b_at_N and A/1/b_at_N+1 have the same IDs and are related
> (2) A/1/a_at_N and B/1/a_at_N+1 have the same IDs and are unrelated
>
> (3) A/1/c_at_N and A/1/c_at_N+1 have same (node,copy) and are related
> (4) B/1/c_at_N and A/1/c_at_N+1 have same (node,copy) and are unreleated
> (5) B/1/a_at_N and B/3/a_at_N+1 have different copyIDs and are related
> (6) A/1/a_at_N and B/3/a_at_N+1 have different copyIDs and are unrelated
>
> (7) A_at_N+1 and B_at_N+2 have different copyIDs and are unrelated,
> A/1/c_at_N+1 and B/2_at_N+2 have the same IDs and are related
> (8) A_at_N+1 and B_at_N+3 have different copyIDs and are unrelated,
> A/1/c_at_N+1 and B/2_at_N+3 have the different copyIDs and are related
> (9) A_at_N+1 and C_at_N+2 have different IDs and are unrelated,
> A/1/c_at_N+1 and B/2_at_N+2 have the same IDs and are related
> (10) A_at_N+1 and C_at_N+3 have different IDs and are unrelated,
> A/1/b_at_N+1 and C/2_at_N+3 have the different copyIDs and are related
>
> (11) A_at_N and C_at_N+1 have different nodeIDs and are unrelated
>
> I.e. only a different nodeID can guarantee a different line of history
> but this is a relatively rare case with node and copy IDs matching
> being the vast majority.
>
> For the same line of history as well as different lines of history
>
> * (nodeID, copyID) may be the same and revIDs *differ or not*
> * nodeID may be the same and copyID + revID may *differ or not*
> * parents may be *related or not*
>
> The latter is the most devastating part since it makes a top-down
> approach impossible where matching IDs could at least guarantee
> relatedness (mismatches would require further investigation).
>
> My feeling is that the fundamental problem is the combination of
> * #DAG nodes total < #paths @ rev
> * moves are not local, i.e. obliterate any parent context
>
> So, do are you still convinced that nodeID and copyID are useful
> for the detection of moves? If so, how do they apply to the cases
> listed above?
>

A more concrete example:

(/trunk/A/ contains *.c and *.h files)
rN: cp /trunk/A /trunk/B,
      mv /trunk/B/* /trunk/B/b_*
      modify /trunk/A/*, /trunk/B/b_*
rN+1: cp /trunk /branch
rN+2: add /branch/include,
      mv /branch/A/*.h /branch/include/*.h,
      mv /branch/B/*.h /branch/include/*.h
rN+3: modify /branch/A/*.c, /branch/B/*.c, /branch/include/*.h
rN+4: add /branch/src,
      mv /branch/A/*.c /branch/src/*.c,
      mv /branch/B/*.c /branch/src/*.c,

The user wants to update her working copy from /branch_at_rN+1
to /branch_at_rN+3. How do you identify the moves?

-- Stefan^2.
Received on 2013-09-18 13:59:26 CEST

This message: [ Message body ]
Next message: Patrick Sawyer: "Re: svn log --ignore-properties option"
Previous message: Stefan Fuhrmann: "Re: Moves in FSFS"
In reply to: Stefan Fuhrmann: "Re: Moves in FSFS"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]