Re: Defining atomic "replace"

From: Neels Janosch Hofmeyr <neels_at_elego.de>
Date: Tue, 11 Aug 2009 20:51:15 +0200

Hi Julian and everyone else,

I'd like to summarize a little about atomic replace operations in the new
editor interface (subversion/include/svn_editor.h,
subversion/libsvn_delta/editor.c) and have a conclusion.

My intent is to further tweak svn_editor.h shortly, so feedback is welcome.

The argument started off like this:

If a file or folder at a certain path is deleted, and some other file or
folder is added at the same path, that is a replace.

Sometimes it is nice and/or necessary to be able to relate the delete() and
the add(), for example to print
R foo
instead of
D foo
M other_stuff
A foo
, and for detecting tree-conflicts.

Currently, wherever that happens, the *receiving* side of the editor
callbacks has to figure out whether a delete coincides with an add at the
same path. That involves caching deleted nodes == bloat and complication.

Fact is, there is at least one *driver* of the editor callbacks that has
already made the correlation of a delete followed by an add, but splits it
up into a delete followed by an add, following callback design.

My crusade is to move the work for combining delete() + add() into a single
"replace" into the *driver* of the new callbacks, considering that that
information may already be lying around for free in most cases.

Even if it isn't, I think it is better to make the *driver* do the
combination work "once", and not burden "all those receivers" with the work
of correlating deletes with adds.

The problem I ran into is this:

To effectively take the burden off the receivers, we need to *enforce* that
the driver actually combines delete + add = replace. If the driver had a
choice to combine or not combine "replace" calls, all those receivers have
to be able to handle *both* cases, and the whole plan backfires, adding more
code instead of simplifying.

I will now try to theoretically home in on a Good API definition around
atomic replace.

Aspects:

(*) Receivers should not have to combine "replace" operations.

--> We need to *disallow* issuing a delete() followed by an add() operation
on the same path. The drivers all need to use the atomic-replace channel
instead, which is to issue an add() operation with a REPLACES_REV arg.

(*) Two-sided vs. detailed history
In current merge (and all other operations AFAIK), if a certain path was
deleted and added multiple times in a given range, all the intermediate
steps are dropped and the only things that matter are
- the initial state and
- the final state.
e.g. (D, A, M, D, A) becomes (D, A), the three in the center are dropped.
However, there are rumors that at some point all the intermediate steps will
be necessary (keyword "operational merge").

--> We need to allow multiple atomic replaces in a given range, so that each
sub-sequence of (D, A) becomes a (R):
(D, A, M, D, A) becomes (R, M, R) in an operational merge.

(*) Atomic move
In the new editor callbacks, there is a separate move() call to represent
atomic moves. It is thus possible to issue a delete() followed by a move()
to the same path that was deleted. Which is, essentially, a replace!

--> We need to also add a REPLACES_REV argument to the move() callback, and
we need to disallow issuing a delete followed by a move to the same path.
Actually, has this case ever been considered? Currently, a move() becomes an
"add-with-history" plus a delete(). Our API doesn't indicate a move
replacing a path. We could need another update/status letter for this.

(*) Copy
There also is a new copy() call instead of the "add(copyfrom_revision)". I
guess it's a bikeshed whether the copy() callback should be separate instead
of adding a "copyfrom_rev" to the add() callback. In short, copy == add+.
Like with moves, it is possible to send a delete() followed by a copy() to
the same path, which amounts to yet another "replace" variant.

--> We need to also add a REPLACES_REV argument to the copy() callback, and
disallow issuing a delete followed by a copy to the same path.

(*) Different meanings
It is conceivable that one single "replace" call may have a different
meaning from separate delete() and add() calls. That would be in the way
that the one type represents a closer relation than the other.

--> My conclusion is that this is not the case / does not matter. An atomic
replace is simply a new node taking the path of another node, the two of
them being unrelated. There *are* no less-related cases than a "replace"
operation (on the same path). There are more-related cases than a "replace",
which then become a move() or a copy() (a.k.a. add-with-history). (Thus I
suggest adding the REPLACES_REV arg to move() and copy() as well, above. If
the added node is related to the node that is being replaced, heavens
forbid, it should become a copy() operation with SRC_RELPATH == DST_RELPATH
and REPLACES_REV == SRC_REVISION.)

Thanks for feedback,

~Neels

Julian Foad wrote:
> On Tue, 2009-07-21, Neels J Hofmeyr wrote:
>> Julian Foad wrote:
>>>> The driver is required to use add() with REPLACES_REV instead of a separate
>>>> delete() call before this add(). Receivers of this callback in general are
>>>> not required to support a separate delete() call for a 'replace'.
>>> I think it would be clearer to state the editor requirements separately
>>> from the usage recommendations. (This last paragraph and the following
>>> one seem to mix the two.)
>>>
>>>> If you intend to implement a separate delete() call when driving this
>>>> editor, make sure you really have to and put up "big red signs" in the
>>>> documentation of your driver. Be aware that, in general, the driver should
>>>> do all of the (possibly hard) work of combining a delete() with an add() to
>>>> form an atomic replace, taking that burden off the various receivers. So if
>>>> you write a driver that doesn't do that, you have to make sure that all your
>>>> receivers (preferably very few) can handle a non-atomic replace.
>>>> Furthermore, a delete() call should occur always before, never after an
>>>> add() call for the same node.
>>>> ]]]
>> Yes, you're right.
>> Where is the line between editor requirements and usage recommendations?
>>
>> Maybe you can help to choose a side... Or are these sides chosen by
>> individual implementations anyway?
>
> It is important to decide and clearly draw the line somewhere. The whole
> point of having a standard "interface" is so that multiple producers and
> consumers can be written independently and someone else can plug them
> together and know that they will work together.
>
>> I am more for forcing an atomic replace. I would rather remove the loophole
>> that says "well, but if you must, just issue separate calls". I'd personally
>> not mind if some parts of the code send separate calls, but I want e.g. the
>> merge callback receivers to be certain that they will *not* receive separate
>> calls *ever*.
>
> OK, we can make that a requirement. We must ensure that all of the
> current producers obey it.
>
>
>> How about we choose the stricter comment, but we let people use it
>> differently if they must without saying so now?
>
> No, let's not invite trouble.
>
>
> [...]
>> - is it ok to disallow multiple add() and delete() calls on the same path,
>> in principle?
>> e.g. during merge, there are only the two sides, merge-left
>> and merge-right. A replace means that both exist but are "unrelated". So
>> a merge will never need to send more than one (delete, add) tuple.
>> Usually it's either *one* delete or *one* add. The (add, delete) tuple
>> simply cancels out, nothing being sent.
>
> Yes, it's absolutely fine to specify that the editor must be driven with
> a single change for each path.
>
> Let's imagine that the user does "svn rm X; svn add X; svn rm X; svn add
> X" in his/her WC. Let's imagine that the WC code has an "undo history"
> that recorded all the separate actions in this sequence. When the user
> makes a commit (or a diff or an update or whatever), then it's the
> calling code's job to ensure that the editor is driven with a single
> change (replace X(old) with X(new)).
>
>> But are there cases where we also send all the steps in-between? Then we
>> could allow this: add(), add(REPLACES_REV), add(REPLACES_REV), delete().
>> But *are there*?
>
> I don't know whether there are currently cases where we do this, other
> than a single delete followed by a single add. But we are defining the
> interface. We must define what we think is best and then the callers
> must do what we say.
>
>> - is it necessary to disallow multiple add() and delete() calls on the same
>> path, to make the atomic replace useful?
>> e.g. during merge, the receiver needs to cache all delete()s and match
>> with add()s, carrying out the leftovers during dir_close(). And merge
>> is not the only one in need of this quirky code construct.
>> I want to eradicate this madness -- I'd love to force atomic replaces
>> on everyone, so the receivers are entirely relieved of this quirkiness.
>> Will allowing both separate and atomic replace calls defy my plan by
>> actually adding more code to the receivers?
>
> Basically, yes, I think it is necessary.
>
> If a receiver wants to handle a replacement as a single operation and
> the editor API does not guarantee that it will get a single call, then
> it can achieve the same end result by batching up the delete requests
> and executing them just before closing the directory. But if it
> implements that, then there would be no benefit in having the single
> "replace" call available.
>
>> - is there a case where separate calls have a different meaning than an
>> atomic replace?
>> e.g. for merge, it doesn't matter whether the nodes are "related".
>> I can't think of any case where we'd want to highlight that the node was
>> deleted and later re-added with the same history, or something. Right?
>
> This is the question of critical importance: Is there any semantic
> difference (difference in meaning) between
>
> replace X with (something)
>
> and
>
> delete X; add (something) at path "X"
>
> ?
>
> Before I try to answer that, first let's review the history of the
> semantics of "svn move" (also know as "svn rename"). Initially we
> decided there was no semantic difference between
>
> move X to Y
>
> and
>
> delete X; add Y as a copy of X
>
> That is true as far as it goes, but unfortunately in the implementation
> we lost the implicit requirement that the "X" being deleted in the first
> half is the very same "X" that's copied in the second half. It got
> translated to
>
> delete X
> add Y as a copy of (X's absolute path at a specific recent-ish
> revision)
>
> That works just about well enough within a single branch, but when
> merging such a change to another branch the connection between the two
> X's is completely broken and the result is not what we meant at all.
>
> We usually want a move in the merge source branch, such as
>
> delete source/X
> add source/Y as a copy of (source/X at its latest revision)
>
> to be merged into the target branch as
>
> delete target/X
> add target/Y as a copy of (target/X at its latest revision)
>
> So, working back to how we could better have defined the semantics of
> "move", we could say "move X to Y" means
>
> remove entry X from X's directory
> (but don't destroy its content, because it's just being moved)
> add entry Y into Y's directory,
> taking its content from the X that's being removed
>
> Note that the two parts are intimately connected together, and that the
> "delete" half is semantically different from a plain "delete".
>
> I think that's where we need to be going with the "true renames" issue
> <http://subversion.tigris.org/issues/show_bug.cgi?id=898>.
>
> That's not quite the end of the story with "move". One side issue is
> that I think the semantics of "copy" should be updated in the same way,
> so when "copy source/X to source/Y" is merged it should add target/Y as
> a copy of (target/X at latest revision), not of (source/X at some
> specific old revision). Another is that I've hand-waved on how the path
> "source/X" is mapped to the path "target/X"; I think it involves
> treating them both as relative to the respective merge root paths, and
> adjusting for any directories moved within the merge. The merge should
> throw an error if the move source is outside the merged tree.
>
> Now, what does that teach us about how we want "replace" to behave? Are
> we making some assumption when we "replace X with a copy of Y" that is
> subtly stronger than "delete X; add a new X that's a copy of Y"? Does it
> make a difference if instead of "Y" we say "a previous revision of X"?
>
> - Julian
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2376254

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2382628

application/pgp-signature attachment: OpenPGP digital signature

Received on 2009-08-11 20:53:54 CEST

This message: [ Message body ]
Next message: Greg Stein: "Re: Defining atomic "replace""
Previous message: Hyrum K. Wright: "Re: svn commit: r38671 - trunk/subversion/libsvn_wc"
Next in thread: Greg Stein: "Re: Defining atomic "replace""
Reply: Greg Stein: "Re: Defining atomic "replace""
Reply: Julian Foad: "Re: Defining atomic "replace""

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]