Re: Defining atomic "replace"

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Wed, 12 Aug 2009 00:27:00 +0100

On Tue, 2009-08-11 at 20:51 +0200, Neels J Hofmeyr wrote:
> Hi Julian and everyone else,
>
> I'd like to summarize a little about atomic replace operations in the new
> editor interface (subversion/include/svn_editor.h,
> subversion/libsvn_delta/editor.c) and have a conclusion.
>
> My intent is to further tweak svn_editor.h shortly, so feedback is welcome.
>
>
>
> The argument started off like this:
>
> If a file or folder at a certain path is deleted, and some other file or
> folder is added at the same path, that is a replace.
>
> Sometimes it is nice and/or necessary to be able to relate the delete() and
> the add(), for example to print
> R foo
> instead of
> D foo
> M other_stuff
> A foo
> , and for detecting tree-conflicts.
>
> Currently, wherever that happens, the *receiving* side of the editor
> callbacks has to figure out whether a delete coincides with an add at the
> same path. That involves caching deleted nodes == bloat and complication.
>
> Fact is, there is at least one *driver* of the editor callbacks that has
> already made the correlation of a delete followed by an add, but splits it
> up into a delete followed by an add, following callback design.
>
> My crusade is to move the work for combining delete() + add() into a single
> "replace" into the *driver* of the new callbacks, considering that that
> information may already be lying around for free in most cases.
>
> Even if it isn't, I think it is better to make the *driver* do the
> combination work "once", and not burden "all those receivers" with the work
> of correlating deletes with adds.
>
> The problem I ran into is this:
>
> To effectively take the burden off the receivers, we need to *enforce* that
> the driver actually combines delete + add = replace. If the driver had a
> choice to combine or not combine "replace" calls, all those receivers have
> to be able to handle *both* cases, and the whole plan backfires, adding more
> code instead of simplifying.
>
> I will now try to theoretically home in on a Good API definition around
> atomic replace.
>
>
> Aspects:
>
>
> (*) Receivers should not have to combine "replace" operations.
>
> --> We need to *disallow* issuing a delete() followed by an add() operation
> on the same path. The drivers all need to use the atomic-replace channel
> instead, which is to issue an add() operation with a REPLACES_REV arg.
>
>
> (*) Two-sided vs. detailed history
> In current merge (and all other operations AFAIK), if a certain path was
> deleted and added multiple times in a given range, all the intermediate
> steps are dropped and the only things that matter are
> - the initial state and
> - the final state.
> e.g. (D, A, M, D, A) becomes (D, A), the three in the center are dropped.
> However, there are rumors that at some point all the intermediate steps will
> be necessary (keyword "operational merge").

(Greg: Neels is talking about a hypothetical new "operation-based" merge
editor that is a consumer of the editor API. It would want to be driven
not simply with the difference between merge-left and merge-right
sources, but rather with the full sequence of changes that were
originally made to get from merge-left to merge-right.)

> --> We need to allow multiple atomic replaces in a given range, so that each
> sub-sequence of (D, A) becomes a (R):
> (D, A, M, D, A) becomes (R, M, R) in an operational merge.

Even if we decide to go towards "operation-based" merging, I think we
have no idea yet what it would look like, and we don't know whether we
would want to represent that (R, M, R) sequence. So I don't think there
is any point in considering that now.

> (*) Atomic move
> In the new editor callbacks, there is a separate move() call to represent
> atomic moves. It is thus possible to issue a delete() followed by a move()
> to the same path that was deleted. Which is, essentially, a replace!
>
> --> We need to also add a REPLACES_REV argument to the move() callback, and
> we need to disallow issuing a delete followed by a move to the same path.
> Actually, has this case ever been considered? Currently, a move() becomes an
> "add-with-history" plus a delete(). Our API doesn't indicate a move
> replacing a path.

I think the current representation of this would be two deletes (delete
the move-source and delete the node being replaced) and then an
add-with-history.

> We could need another update/status letter for this.

For now, it can recreate the existing indication(s) on multiple lines.

D move-source-path
D replaced-path
A+ replaced-path (copied from move-source-path)

> (*) Copy
> There also is a new copy() call instead of the "add(copyfrom_revision)". I
> guess it's a bikeshed whether the copy() callback should be separate instead
> of adding a "copyfrom_rev" to the add() callback. In short, copy == add+.
> Like with moves, it is possible to send a delete() followed by a copy() to
> the same path, which amounts to yet another "replace" variant.
>
> --> We need to also add a REPLACES_REV argument to the copy() callback, and
> disallow issuing a delete followed by a copy to the same path.

Sounds good.

> (*) Different meanings
> It is conceivable that one single "replace" call may have a different
> meaning from separate delete() and add() calls. That would be in the way
> that the one type represents a closer relation than the other.
>
> --> My conclusion is that this is not the case / does not matter. An atomic
> replace is simply a new node taking the path of another node, the two of
> them being unrelated. There *are* no less-related cases than a "replace"
> operation (on the same path).

This all sounds good to me.

> There are more-related cases than a "replace",

You talk about copies and moves that replace an existing node. I don't
think these operations involve a "more related" kind of replacement,
they just involve the same kind of replacement within a different
operation.

I can't think of any more or less related kinds of replacement. In other
words, I can't think of any reason why a user might consider the
"replace" operation to mean something different from the combination of
"delete" and "add".

> which then become a move() or a copy() (a.k.a. add-with-history). (Thus I
> suggest adding the REPLACES_REV arg to move() and copy() as well, above. If
> the added node is related to the node that is being replaced, heavens

"related to"? Do you mean "a copy of"?

> forbid, it should become a copy() operation with SRC_RELPATH == DST_RELPATH
> and REPLACES_REV == SRC_REVISION.)

So the result is: file_at_10 is copied from file_at_9. I don't know if a
Subversion repository can tell the difference between that and a no-op
(or a simple content modification). I would be interested to know
whether it can. Either way, I don't think it affects this proposal.

So: +1 to the plan.

- Julian

> Thanks for feedback,
>
> ~Neels
>
>
> Julian Foad wrote:
> > On Tue, 2009-07-21, Neels J Hofmeyr wrote:
> >> Julian Foad wrote:
> >>>> The driver is required to use add() with REPLACES_REV instead of a separate
> >>>> delete() call before this add(). Receivers of this callback in general are
> >>>> not required to support a separate delete() call for a 'replace'.
> >>> I think it would be clearer to state the editor requirements separately
> >>> from the usage recommendations. (This last paragraph and the following
> >>> one seem to mix the two.)
> >>>
> >>>> If you intend to implement a separate delete() call when driving this
> >>>> editor, make sure you really have to and put up "big red signs" in the
> >>>> documentation of your driver. Be aware that, in general, the driver should
> >>>> do all of the (possibly hard) work of combining a delete() with an add() to
> >>>> form an atomic replace, taking that burden off the various receivers. So if
> >>>> you write a driver that doesn't do that, you have to make sure that all your
> >>>> receivers (preferably very few) can handle a non-atomic replace.
> >>>> Furthermore, a delete() call should occur always before, never after an
> >>>> add() call for the same node.
> >>>> ]]]
> >> Yes, you're right.
> >> Where is the line between editor requirements and usage recommendations?
> >>
> >> Maybe you can help to choose a side... Or are these sides chosen by
> >> individual implementations anyway?
> >
> > It is important to decide and clearly draw the line somewhere. The whole
> > point of having a standard "interface" is so that multiple producers and
> > consumers can be written independently and someone else can plug them
> > together and know that they will work together.
> >
> >> I am more for forcing an atomic replace. I would rather remove the loophole
> >> that says "well, but if you must, just issue separate calls". I'd personally
> >> not mind if some parts of the code send separate calls, but I want e.g. the
> >> merge callback receivers to be certain that they will *not* receive separate
> >> calls *ever*.
> >
> > OK, we can make that a requirement. We must ensure that all of the
> > current producers obey it.
> >
> >
> >> How about we choose the stricter comment, but we let people use it
> >> differently if they must without saying so now?
> >
> > No, let's not invite trouble.
> >
> >
> > [...]
> >> - is it ok to disallow multiple add() and delete() calls on the same path,
> >> in principle?
> >> e.g. during merge, there are only the two sides, merge-left
> >> and merge-right. A replace means that both exist but are "unrelated". So
> >> a merge will never need to send more than one (delete, add) tuple.
> >> Usually it's either *one* delete or *one* add. The (add, delete) tuple
> >> simply cancels out, nothing being sent.
> >
> > Yes, it's absolutely fine to specify that the editor must be driven with
> > a single change for each path.
> >
> > Let's imagine that the user does "svn rm X; svn add X; svn rm X; svn add
> > X" in his/her WC. Let's imagine that the WC code has an "undo history"
> > that recorded all the separate actions in this sequence. When the user
> > makes a commit (or a diff or an update or whatever), then it's the
> > calling code's job to ensure that the editor is driven with a single
> > change (replace X(old) with X(new)).
> >
> >> But are there cases where we also send all the steps in-between? Then we
> >> could allow this: add(), add(REPLACES_REV), add(REPLACES_REV), delete().
> >> But *are there*?
> >
> > I don't know whether there are currently cases where we do this, other
> > than a single delete followed by a single add. But we are defining the
> > interface. We must define what we think is best and then the callers
> > must do what we say.
> >
> >> - is it necessary to disallow multiple add() and delete() calls on the same
> >> path, to make the atomic replace useful?
> >> e.g. during merge, the receiver needs to cache all delete()s and match
> >> with add()s, carrying out the leftovers during dir_close(). And merge
> >> is not the only one in need of this quirky code construct.
> >> I want to eradicate this madness -- I'd love to force atomic replaces
> >> on everyone, so the receivers are entirely relieved of this quirkiness.
> >> Will allowing both separate and atomic replace calls defy my plan by
> >> actually adding more code to the receivers?
> >
> > Basically, yes, I think it is necessary.
> >
> > If a receiver wants to handle a replacement as a single operation and
> > the editor API does not guarantee that it will get a single call, then
> > it can achieve the same end result by batching up the delete requests
> > and executing them just before closing the directory. But if it
> > implements that, then there would be no benefit in having the single
> > "replace" call available.
> >
> >> - is there a case where separate calls have a different meaning than an
> >> atomic replace?
> >> e.g. for merge, it doesn't matter whether the nodes are "related".
> >> I can't think of any case where we'd want to highlight that the node was
> >> deleted and later re-added with the same history, or something. Right?
> >
> > This is the question of critical importance: Is there any semantic
> > difference (difference in meaning) between
> >
> > replace X with (something)
> >
> > and
> >
> > delete X; add (something) at path "X"
> >
> > ?
> >
> > Before I try to answer that, first let's review the history of the
> > semantics of "svn move" (also know as "svn rename"). Initially we
> > decided there was no semantic difference between
> >
> > move X to Y
> >
> > and
> >
> > delete X; add Y as a copy of X
> >
> > That is true as far as it goes, but unfortunately in the implementation
> > we lost the implicit requirement that the "X" being deleted in the first
> > half is the very same "X" that's copied in the second half. It got
> > translated to
> >
> > delete X
> > add Y as a copy of (X's absolute path at a specific recent-ish
> > revision)
> >
> > That works just about well enough within a single branch, but when
> > merging such a change to another branch the connection between the two
> > X's is completely broken and the result is not what we meant at all.
> >
> > We usually want a move in the merge source branch, such as
> >
> > delete source/X
> > add source/Y as a copy of (source/X at its latest revision)
> >
> > to be merged into the target branch as
> >
> > delete target/X
> > add target/Y as a copy of (target/X at its latest revision)
> >
> > So, working back to how we could better have defined the semantics of
> > "move", we could say "move X to Y" means
> >
> > remove entry X from X's directory
> > (but don't destroy its content, because it's just being moved)
> > add entry Y into Y's directory,
> > taking its content from the X that's being removed
> >
> > Note that the two parts are intimately connected together, and that the
> > "delete" half is semantically different from a plain "delete".
> >
> > I think that's where we need to be going with the "true renames" issue
> > <http://subversion.tigris.org/issues/show_bug.cgi?id=898>.
> >
> > That's not quite the end of the story with "move". One side issue is
> > that I think the semantics of "copy" should be updated in the same way,
> > so when "copy source/X to source/Y" is merged it should add target/Y as
> > a copy of (target/X at latest revision), not of (source/X at some
> > specific old revision). Another is that I've hand-waved on how the path
> > "source/X" is mapped to the path "target/X"; I think it involves
> > treating them both as relative to the respective merge root paths, and
> > adjusting for any directories moved within the merge. The merge should
> > throw an error if the move source is outside the merged tree.
> >
> > Now, what does that teach us about how we want "replace" to behave? Are
> > we making some assumption when we "replace X with a copy of Y" that is
> > subtly stronger than "delete X; add a new X that's a copy of Y"? Does it
> > make a difference if instead of "Y" we say "a previous revision of X"?
> >
> > - Julian
> >
> > ------------------------------------------------------
> > http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2376254
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2382730
Received on 2009-08-12 01:27:40 CEST

This message: [ Message body ]
Next message: Julian Foad: "Re: Defining atomic "replace""
Previous message: Greg Stein: "Re: svn commit: r38683 - in trunk/subversion: include libsvn_client"
In reply to: Neels Janosch Hofmeyr: "Re: Defining atomic "replace""
Next in thread: Julian Foad: "Re: Defining atomic "replace""
Reply: Julian Foad: "Re: Defining atomic "replace""
Reply: Neels Janosch Hofmeyr: "Re: Defining atomic "replace""
Reply: Neels Janosch Hofmeyr: "no-op-copy -- was: Re: Defining atomic "replace""

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]