[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Moves in FSFS

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Wed, 18 Sep 2013 13:05:15 +0200

On Fri, Sep 13, 2013 at 12:41 PM, Branko Čibej <brane_at_wandisco.com> wrote:

> On 13.09.2013 11:32, Philip Martin wrote:
> > Branko Čibej <brane_at_wandisco.com> writes:
> >
> >> That said, I still do not understand why a different ID would be needed
> >> before the copy-on-write happens. Is it because the client doesn't have
> >> the full history available? If that's the case, I suggest we explore
> >> this on a case-by-case basis, including determining how the initial
> >> state of a working copy for each case can actually occur.
> > Is the FS going only able to provide the moves that apply between two
> > consecutive revisions? Or is it able to provide the combined moves
> > between arbitrary revisions?
> >
> > One way to provide the moves between arbitrary revisions is to iterate
> > through the intervening revisions accumulating and combining the moves.
> > That has the potential to be slow.
> >
> > Another way to provide the moves between arbitrary revisions is to have
> > an id to path map per revision which allows the FS to find the path
> > associated with a given id. However with lazy-copy this map is harder
> > to implement.
> >
> > There is another aspect to the lazy-copy which is when does the new
> > copy-id get assigned to the lazy children. If we commit
> >
> > move A/f A/g
> >
> > then move does not allocate a new copy-id and A/f has the same copy-id
> > as A/g. I think we intend this to be true if the commit combines a move
> > and a modification to the node. Now commit:
> >
> > copy A B
> >
> > here B gets a new copy-id and lazily copied children of B still have the
> > old copy-id. Now what about this commit:
> >
> > move B/g B/h
> >
> > Does move preserve the copy-id so that B/h is still a reference to A/g?
>
> A move through the copied parent has to be interpreted as a write to the
> subtree, which means that the copy-on-write semantics kick in. The move
> then breaks down into:
>
> make-mutable B/g <-- lazy copy, assigns a new copy-id
> move B/g B/h <-- move semantics, B/h keeps same copy-id
>
> You'll not that "make-mutable" is an implementation detail of the
> top-down DAG FS model, and it already does what I described above. This
> is not some new code we'd have to write to implement moves this way; we
> just have to obey existing rules, i.e., before operating on a path
> within an FS transaction, the path must first be made mutable. In other
> words, the FS implementation already works the way I described,
> regardless of whether the actual operation is "move" or something else.
>
> > If the commit was a combined move/modification then B/h would have to
> > get a new copy-id.
>
> It has to get one in any case, as explained above. Operations that
> affect the state of a node, when performed on a path that contains a
> copied parent, must produce the same result regardless of whether the
> filesystem implements lazy copying or not. For the purpose of this
> model, the node's path is part of its state; although of course the path
> is not in fact a unique property of the node.
>
> But it's not necessary to assign new IDs when the node's state is
> /read/; i.e., we do not need copy-on-read in order for this model to
> work. The only consideration is that all operations must work correctly
> with or without lazy copying. It's OK IMO to let the lazy-copy detail
> leak outside the FS implementation, as long as we do not make it a
> required feature.
>

Hi Brane,

As described above, the ID assignment works for FSFS
internals but is insufficient to answer the following question:

Are pathA_at_rA and pathB_at_rB points on the same line of history?

The more theoretical background can be found in my other post,
but the following example should illustrate the problem:

rN: cp A B, add C (A has sub-nodes A/1, A/1/a, A/1/b, A/1/c)
rN+1: mv B/1 B/3, modify A/1/c, modify B/3/a
rN+2: mv A/1/c B/2, mv A/1/b C/2
rN+3: modify B/2, modify C/2

Now, the following statements are true ("related" here means
"on the same line of history"):

(1) A/1/b_at_N and A/1/b_at_N+1 have the same IDs and are related
(2) A/1/a_at_N and B/1/a_at_N+1 have the same IDs and are unrelated

(3) A/1/c_at_N and A/1/c_at_N+1 have same (node,copy) and are related
(4) B/1/c_at_N and A/1/c_at_N+1 have same (node,copy) and are unreleated
(5) B/1/a_at_N and B/3/a_at_N+1 have different copyIDs and are related
(6) A/1/a_at_N and B/3/a_at_N+1 have different copyIDs and are unrelated

(7) A_at_N+1 and B_at_N+2 have different copyIDs and are unrelated,
  A/1/c_at_N+1 and B/2_at_N+2 have the same IDs and are related
(8) A_at_N+1 and B_at_N+3 have different copyIDs and are unrelated,
  A/1/c_at_N+1 and B/2_at_N+3 have the different copyIDs and are related
(9) A_at_N+1 and C_at_N+2 have different IDs and are unrelated,
  A/1/c_at_N+1 and B/2_at_N+2 have the same IDs and are related
(10) A_at_N+1 and C_at_N+3 have different IDs and are unrelated,
  A/1/b_at_N+1 and C/2_at_N+3 have the different copyIDs and are related

(11) A_at_N and C_at_N+1 have different nodeIDs and are unrelated

I.e. only a different nodeID can guarantee a different line of history
but this is a relatively rare case with node and copy IDs matching
being the vast majority.

For the same line of history as well as different lines of history

* (nodeID, copyID) may be the same and revIDs *differ or not*
* nodeID may be the same and copyID + revID may *differ or not*
* parents may be *related or not*

The latter is the most devastating part since it makes a top-down
approach impossible where matching IDs could at least guarantee
relatedness (mismatches would require further investigation).

My feeling is that the fundamental problem is the combination of
* #DAG nodes total < #paths @ rev
* moves are not local, i.e. obliterate any parent context

So, do are you still convinced that nodeID and copyID are useful
for the detection of moves? If so, how do they apply to the cases
listed above?

-- Stefan^2.
Received on 2013-09-18 13:05:54 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.