Re: Defining atomic "replace"

From: Neels Janosch Hofmeyr <neels_at_elego.de>
Date: Wed, 12 Aug 2009 19:49:37 +0200

Julian Foad wrote:
> On Tue, 2009-08-11 at 20:51 +0200, Neels J Hofmeyr wrote:
>> Hi Julian and everyone else,
>>
>> I'd like to summarize a little about atomic replace operations in the new
>> editor interface (subversion/include/svn_editor.h,
>> subversion/libsvn_delta/editor.c) and have a conclusion.
>>
>> My intent is to further tweak svn_editor.h shortly, so feedback is welcome.
>>
>>
>>
>> The argument started off like this:
>>
>> If a file or folder at a certain path is deleted, and some other file or
>> folder is added at the same path, that is a replace.
>>
>> Sometimes it is nice and/or necessary to be able to relate the delete() and
>> the add(), for example to print
>> R foo
>> instead of
>> D foo
>> M other_stuff
>> A foo
>> , and for detecting tree-conflicts.
>>
>> Currently, wherever that happens, the *receiving* side of the editor
>> callbacks has to figure out whether a delete coincides with an add at the
>> same path. That involves caching deleted nodes == bloat and complication.
>>
>> Fact is, there is at least one *driver* of the editor callbacks that has
>> already made the correlation of a delete followed by an add, but splits it
>> up into a delete followed by an add, following callback design.
>>
>> My crusade is to move the work for combining delete() + add() into a single
>> "replace" into the *driver* of the new callbacks, considering that that
>> information may already be lying around for free in most cases.
>>
>> Even if it isn't, I think it is better to make the *driver* do the
>> combination work "once", and not burden "all those receivers" with the work
>> of correlating deletes with adds.
>>
>> The problem I ran into is this:
>>
>> To effectively take the burden off the receivers, we need to *enforce* that
>> the driver actually combines delete + add = replace. If the driver had a
>> choice to combine or not combine "replace" calls, all those receivers have
>> to be able to handle *both* cases, and the whole plan backfires, adding more
>> code instead of simplifying.
>>
>> I will now try to theoretically home in on a Good API definition around
>> atomic replace.
>>
>>
>> Aspects:
>>
>>
>> (*) Receivers should not have to combine "replace" operations.
>>
>> --> We need to *disallow* issuing a delete() followed by an add() operation
>> on the same path. The drivers all need to use the atomic-replace channel
>> instead, which is to issue an add() operation with a REPLACES_REV arg.
>>
>>
>> (*) Two-sided vs. detailed history
>> In current merge (and all other operations AFAIK), if a certain path was
>> deleted and added multiple times in a given range, all the intermediate
>> steps are dropped and the only things that matter are
>> - the initial state and
>> - the final state.
>> e.g. (D, A, M, D, A) becomes (D, A), the three in the center are dropped.
>> However, there are rumors that at some point all the intermediate steps will
>> be necessary (keyword "operational merge").
>
> (Greg: Neels is talking about a hypothetical new "operation-based" merge
> editor that is a consumer of the editor API. It would want to be driven
> not simply with the difference between merge-left and merge-right
> sources, but rather with the full sequence of changes that were
> originally made to get from merge-left to merge-right.)
(yep.)

>> --> We need to allow multiple atomic replaces in a given range, so that each
>> sub-sequence of (D, A) becomes a (R):
>> (D, A, M, D, A) becomes (R, M, R) in an operational merge.
>
> Even if we decide to go towards "operation-based" merging, I think we
> have no idea yet what it would look like, and we don't know whether we
> would want to represent that (R, M, R) sequence. So I don't think there
> is any point in considering that now.

Well, I was just hypothetically testing the constraint of "must combine" for
the "operational merge" concept. I didn't want to introduce a constraint
that would cause design pains soon after. So, in the docs, I'll say
something like "you must combine D,A to R, but you could have multiple Rs in
a row to represent multiple steps". Legal: A, R, R, M, R, D

> For now, it can recreate the existing indication(s) on multiple lines.
> Or rather
>
> D move-source-path
> R+ replaced-path (copied from move-source-path)

Yea, I was thinking the same.

>
> D move-source-path
> D replaced-path
> A+ replaced-path (copied from move-source-path)
>
>
>> (*) Copy
>> There also is a new copy() call instead of the "add(copyfrom_revision)". I
>> guess it's a bikeshed whether the copy() callback should be separate instead
>> of adding a "copyfrom_rev" to the add() callback. In short, copy == add+.
>> Like with moves, it is possible to send a delete() followed by a copy() to
>> the same path, which amounts to yet another "replace" variant.
>>
>> --> We need to also add a REPLACES_REV argument to the copy() callback, and
>> disallow issuing a delete followed by a copy to the same path.
>
> Sounds good.
>
>
>> (*) Different meanings
>> It is conceivable that one single "replace" call may have a different
>> meaning from separate delete() and add() calls. That would be in the way
>> that the one type represents a closer relation than the other.
>>
>> --> My conclusion is that this is not the case / does not matter. An atomic
>> replace is simply a new node taking the path of another node, the two of
>> them being unrelated. There *are* no less-related cases than a "replace"
>> operation (on the same path).
>
> This all sounds good to me.
>
>> There are more-related cases than a "replace",
>
> You talk about copies and moves that replace an existing node. I don't
> think these operations involve a "more related" kind of replacement,
> they just involve the same kind of replacement within a different
> operation.
>
> I can't think of any more or less related kinds of replacement. In other
> words, I can't think of any reason why a user might consider the
> "replace" operation to mean something different from the combination of
> "delete" and "add".
>
>
>> which then become a move() or a copy() (a.k.a. add-with-history). (Thus I
>> suggest adding the REPLACES_REV arg to move() and copy() as well, above. If
>> the added node is related to the node that is being replaced, heavens
>
> "related to"? Do you mean "a copy of"?

I was thinking specifically of a resurrection:

rev op
1 A foo.c
2 M foo.c
3 D foo.c
4 A foo.c <-- an unrelated node
5 R+ foo.c (from foo.c_at_2)

hm, what was I thinking again?

>> forbid, it should become a copy() operation with SRC_RELPATH == DST_RELPATH
>> and REPLACES_REV == SRC_REVISION.)

Ah, ok, that thinking was flawed, actually REPLACES_REV > SRC_REVISION (4>2)
-- sorry.

> So the result is: file_at_10 is copied from file_at_9. I don't know if a
> Subversion repository can tell the difference between that and a no-op
> (or a simple content modification).

But that case wouldn't modify any content. It could result in a log message
with maybe props modified, but no code modified at all. Heh, we should
probably forbid/skip that.

> So: +1 to the plan.

Greg, Julian, t'was a great pleasure to read your responses. Nice to see I'm
on the right track.

~Neels

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2382964

application/pgp-signature attachment: OpenPGP digital signature

Received on 2009-08-12 19:52:06 CEST

This message: [ Message body ]
Next message: Neels Janosch Hofmeyr: "no-op-copy -- was: Re: Defining atomic "replace""
Previous message: Julian Foad: "Re: [PATCH]remove adm_access batons in svn_wc_translated_file3()"
In reply to: Julian Foad: "Re: Defining atomic "replace""
Next in thread: Neels Janosch Hofmeyr: "no-op-copy -- was: Re: Defining atomic "replace""

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]