Defining atomic "replace" [was: svn commit: r38394 - in trunk/subversion: include libsvn_delta]

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 28 Jul 2009 11:28:30 +0100

On Tue, 2009-07-21, Neels J Hofmeyr wrote:
>
> Julian Foad wrote:
> >> The driver is required to use add() with REPLACES_REV instead of a separate
> >> delete() call before this add(). Receivers of this callback in general are
> >> not required to support a separate delete() call for a 'replace'.
> >
> > I think it would be clearer to state the editor requirements separately
> > from the usage recommendations. (This last paragraph and the following
> > one seem to mix the two.)
> >
> >> If you intend to implement a separate delete() call when driving this
> >> editor, make sure you really have to and put up "big red signs" in the
> >> documentation of your driver. Be aware that, in general, the driver should
> >> do all of the (possibly hard) work of combining a delete() with an add() to
> >> form an atomic replace, taking that burden off the various receivers. So if
> >> you write a driver that doesn't do that, you have to make sure that all your
> >> receivers (preferably very few) can handle a non-atomic replace.
> >> Furthermore, a delete() call should occur always before, never after an
> >> add() call for the same node.
> >> ]]]
>
> Yes, you're right.
> Where is the line between editor requirements and usage recommendations?
>
> Maybe you can help to choose a side... Or are these sides chosen by
> individual implementations anyway?

It is important to decide and clearly draw the line somewhere. The whole
point of having a standard "interface" is so that multiple producers and
consumers can be written independently and someone else can plug them
together and know that they will work together.

> I am more for forcing an atomic replace. I would rather remove the loophole
> that says "well, but if you must, just issue separate calls". I'd personally
> not mind if some parts of the code send separate calls, but I want e.g. the
> merge callback receivers to be certain that they will *not* receive separate
> calls *ever*.

OK, we can make that a requirement. We must ensure that all of the
current producers obey it.

> How about we choose the stricter comment, but we let people use it
> differently if they must without saying so now?

No, let's not invite trouble.

[...]
> - is it ok to disallow multiple add() and delete() calls on the same path,
> in principle?
> e.g. during merge, there are only the two sides, merge-left
> and merge-right. A replace means that both exist but are "unrelated". So
> a merge will never need to send more than one (delete, add) tuple.
> Usually it's either *one* delete or *one* add. The (add, delete) tuple
> simply cancels out, nothing being sent.

Yes, it's absolutely fine to specify that the editor must be driven with
a single change for each path.

Let's imagine that the user does "svn rm X; svn add X; svn rm X; svn add
X" in his/her WC. Let's imagine that the WC code has an "undo history"
that recorded all the separate actions in this sequence. When the user
makes a commit (or a diff or an update or whatever), then it's the
calling code's job to ensure that the editor is driven with a single
change (replace X(old) with X(new)).

> But are there cases where we also send all the steps in-between? Then we
> could allow this: add(), add(REPLACES_REV), add(REPLACES_REV), delete().
> But *are there*?

I don't know whether there are currently cases where we do this, other
than a single delete followed by a single add. But we are defining the
interface. We must define what we think is best and then the callers
must do what we say.

> - is it necessary to disallow multiple add() and delete() calls on the same
> path, to make the atomic replace useful?
> e.g. during merge, the receiver needs to cache all delete()s and match
> with add()s, carrying out the leftovers during dir_close(). And merge
> is not the only one in need of this quirky code construct.
> I want to eradicate this madness -- I'd love to force atomic replaces
> on everyone, so the receivers are entirely relieved of this quirkiness.
> Will allowing both separate and atomic replace calls defy my plan by
> actually adding more code to the receivers?

Basically, yes, I think it is necessary.

If a receiver wants to handle a replacement as a single operation and
the editor API does not guarantee that it will get a single call, then
it can achieve the same end result by batching up the delete requests
and executing them just before closing the directory. But if it
implements that, then there would be no benefit in having the single
"replace" call available.

> - is there a case where separate calls have a different meaning than an
> atomic replace?
> e.g. for merge, it doesn't matter whether the nodes are "related".
> I can't think of any case where we'd want to highlight that the node was
> deleted and later re-added with the same history, or something. Right?

This is the question of critical importance: Is there any semantic
difference (difference in meaning) between

replace X with (something)

and

delete X; add (something) at path "X"

Before I try to answer that, first let's review the history of the
semantics of "svn move" (also know as "svn rename"). Initially we
decided there was no semantic difference between

move X to Y

and

delete X; add Y as a copy of X

That is true as far as it goes, but unfortunately in the implementation
we lost the implicit requirement that the "X" being deleted in the first
half is the very same "X" that's copied in the second half. It got
translated to

delete X
add Y as a copy of (X's absolute path at a specific recent-ish
revision)

That works just about well enough within a single branch, but when
merging such a change to another branch the connection between the two
X's is completely broken and the result is not what we meant at all.

We usually want a move in the merge source branch, such as

delete source/X
add source/Y as a copy of (source/X at its latest revision)

to be merged into the target branch as

delete target/X
add target/Y as a copy of (target/X at its latest revision)

So, working back to how we could better have defined the semantics of
"move", we could say "move X to Y" means

  remove entry X from X's directory
    (but don't destroy its content, because it's just being moved)
  add entry Y into Y's directory,
    taking its content from the X that's being removed

Note that the two parts are intimately connected together, and that the
"delete" half is semantically different from a plain "delete".

I think that's where we need to be going with the "true renames" issue
<http://subversion.tigris.org/issues/show_bug.cgi?id=898>.

That's not quite the end of the story with "move". One side issue is
that I think the semantics of "copy" should be updated in the same way,
so when "copy source/X to source/Y" is merged it should add target/Y as
a copy of (target/X at latest revision), not of (source/X at some
specific old revision). Another is that I've hand-waved on how the path
"source/X" is mapped to the path "target/X"; I think it involves
treating them both as relative to the respective merge root paths, and
adjusting for any directories moved within the merge. The merge should
throw an error if the move source is outside the merged tree.

Now, what does that teach us about how we want "replace" to behave? Are
we making some assumption when we "replace X with a copy of Y" that is
subtly stronger than "delete X; add a new X that's a copy of Y"? Does it
make a difference if instead of "Y" we say "a previous revision of X"?

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2376254
Received on 2009-07-28 12:28:59 CEST

This message: [ Message body ]
Next message: Julian Foad: "Re: [PATCH v3] Change test suite to read file properties in XML format"
Previous message: Michael Haggerty: "[PATCH v3] Change test suite to read file properties in XML format"
In reply to: Neels Janosch Hofmeyr: "Re: svn commit: r38394 - in trunk/subversion: include libsvn_delta"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]