[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Moves in FSFS

From: Branko Čibej <brane_at_wandisco.com>
Date: Sat, 28 Sep 2013 12:36:55 +0200

On 28.09.2013 12:06, Bill Tutt wrote:
> Move tracking can clearly make ones head spin. :)
>
> Brane: Could Julian and Stephan be asking about how to map between
> arbitrary revisions thinking of tree merge scenarios?

There are two, somewhat related issues. The first involves understanding
what actually happened to the tree, e.g., detecting that a certain
change in the tree was a move, not something else. We realized more than
10 years ago that in our current top-down DAG model, getting this info
just by analyzing the tree would be very expensive, and in some cases
infeasible; so it doesn't really come as a surprise that looking just at
the (node-id, copy-id) of a node revision is not enough.

That's why the changes list was introduced (as the "changes" table in
BDB, and an equivalent structure in FSFS); and I think we've come to the
conclusion that it will have to be used to detect moves as well.

> That's what always gave me heart burn with the current Noderev
> behavior and envisioning real moves in Subversion.

Yup.

> In thinking deep thoughts about this in the deep dark dark distant
> past I came up with several interesting thoughts: (sadly none of these
> are completely thought out answers, but they did seem like thoughts
> worth pondering wrt Subversion if you haven't before)
> * Have you considered having the noderev own the node's name instead
> of how the parent directory currently owns it? This way it would be
> natural to make the destination of the move a mutated noderev.

Yes. In fact I've made a stab at defining a different, IMO better model
for storing versioned tree metadata; see:

https://wiki.apache.org/subversion/Berlin2013?action=AttachFile&do=view&target=metadata-tng.pdf

In this model, file names are represented as properties of the node, not
its parent. It also does away with the necessity of having a separate
changes list. But implementing it will be part of a larger effort which
includes inventing a new API for the FS layer (in hindisght, we should
never have exposed the top-down DAG as a public API).

> i.e. mv /trunk/A -> /trunk/B COULD do this in the DAG:
> * make /trunk/A mutable by ONLY updating the TxnID of the Noderev.
> * Edit the name field of the noderev from A ->B
> More complicated example:
> mv /trunk/A/B/C -> /trunk/A/D COULD do this to the DAG:
> * make /trunk/A/B/C mutable by ONLY updating the TxnID of the
> Noderev.
> * remove it as an entry from /trunk/A/B and add it as an entry
> to /trunk/A
> However... what do you do for this case: (C could have a zero copy_id
> here I think)
> mv /trunk/A/B/C -> /blah could do this to the DAG:
> ????
> or this one: (presuming C has a non-zero copy_id before move)
> mv /branch/A/B/C -> /blah2
> ????
> or (same as the last one, but C was the destination of the "soft-copy")
> ???
> The enumeration of the rest of the cases is left as an exercise for
> the reader. :)

That PDF mentions many of these. :)

> * The implementation of rename tracking should ensure you enable all
> scenarios you desire for dealing with merging renamed items from
> branch A to branch B. (the tree shape part of the merge of course)
> That is, if you care, some systems don't/won't/did and then decided
> not to/whatever. Merging renames leads to lousy complex edge case tree
> conflicts that need usable (in the easy to use usability sense)
> resolution UI to solve. .

Exactly. And this is the second issue of the two I mentioned: in order
to make merge aware of moves, as opposed to copies, you have to be able
to detect that two distinct path_at_revision in fact refer to the same
branch of a node -- which is exactly the (node-id, copy-id) pair, and
that /is/ sufficient for the purpose. So again, there's no need to
invent another branch-tracking identifier.

(This is almost the opposite of detecting moves -- it's closer to
detecting sameness, for some definition of "same".)

> * If moves are lazy, and you need to track moves differently from
> copies then consider: What if you expanded a noderev with a new
> move_id part? :)

Here's the thing -- moves are never lazy. For all intents and purposes,
they behave exactly as content or prop modifications. In fact, you can
think of a node's name as one of its properties, instead of a property
of its parent; although of course this is not what the current model does.

> Some systems solve this problem by recording in the merge history the
> (noderev, copyid) equivalents the merge history was recorded from.
> (Not just when merge was invoked, but also initial merge history for
> the initial copy.) That could also be a way of determining where
> /trunk/A/B/C/q.c is in /branch after a number of arbitrary number of
> directory hierarchy changes and file name alterations somewhere in
> /branch. These systems don't easily support merge history elision
> concepts though. (As well as requiring specialized merge history
> storage at the FS level for performance reasons.) Of course, I would
> think merge history could benefit from specialized storage anyway. :)

Indeed, this is another thing we've been talking about, and another goal
for the next generation versioned FS. We're well aware that the way we
currently store mergeinfo is ... less than ideal.

> If you followed this approach the exact way the Noderev
> node_id/copy_id changes might be less relevant.
>
> Sorry for making your head hurt some more with these ancient nutty
> musings,

Actually, I'd prefer to see more of your musings on this list ...

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. brane_at_wandisco.com
Received on 2013-09-28 12:37:55 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.