Re: Problems Reintegrating the fsfs-format7 branch

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 5 Mar 2013 16:37:39 +0000 (GMT)

Paul Burba wrote:
> On Wed, Feb 27, 2013 at 5:04 PM, Julian Foad wrote:
>> Paul Burba wrote:
>>> I found the cause of the conflict filled reintegrate merge. The
>>> automatic merge code seems to be doing the right thing re Mark's
>>> automatic merge above, the problem arises earlier.
>>
>> Paul, thanks for investigating. While I'm glad to hear the automatic
>> merge was not the main root cause of the problem, it would make sense
>> for it to detect that merges in the opposite direction have been done
>> and at least warn the user.
>
> Hi Julian,
>
> What are we going to warn them to do exactly?

What I was thinking of here is to detect the most general condition that isn't handled properly. Let me explain this part, although I'm sure you're familiar with it.

Whenever the direction of merge is from A to B and the automatic merge code doesn't detect that a reintegrate merge is required, and yet one or more cherry-picked revision ranges and/or subtree merges have been merged from B to A, Subversion currently doesn't notice those merges from B to A, and proceeds to merge "everything" from A to B including those changes that were the result of B-to-A merges. That is what the plain "merge" from A to B has always done -- it has always ignored any mergeinfo describing merges from B to A.

In the example at hand, there has been a B-to-A merge of everything from B (except some non-operative revisions), and so we get loads of conflicts by ignoring this when we merge the other way. The intelligent user thinks, "This is clearly a reintegrate scenario."

On the other hand, if there has been a B-to-A merge of not quite everything from B (leaving an operative unmerged part at the beginning or somewhere in the middle), then an old "reintegrate" merge from A to B would error out, whereas an old non-reintegrate merge would just do the merge and wrongly re-merge all the B-to-A changes back to B. The new automatic merge, if it determines (based on the merging of the branch root node) to do a non-reintegrate, currently does the latter.

At some point, maybe not for 1.8, if we can detect any such condition, it would be a good to warn the user:

A clean merge from A to B is not possible because some but not all
changes have been merged from B to A. You're going to see conflicts
(unless all your potential conflicts are trivially auto-resolved).

We could add some suggestions to the warning:

If you are expecting this to be a reintegrate-like merge, check that
all changes on B including non-operative revision ranges have been
merged to A and recorded in the mergeinfo; try an automatic sync merge
from B to A to fill in any gaps.

If you are expecting a sync merge, note that there have been some
cherry-pick and/or subtree merges from B to A, and this merge will
attempt to merge these same changes back to B, which will likely
produce conflicts.

We could add more specific details of what's been found, and more specific suggestions related to the details, such as the case at hand where all operative revs have been merged B-to-A, if we think it's worth the effort.

A basic implementation would not be cheap if there is a significant amout of subtree mergeinfo to check. And I have carefully avoided the issue of whether we should error out or proceed with the merge, as there's no point bike-shedding about that unless we really decide to do this.

So let's take a closer look at what we could do for the specific case at hand.

> To explicitly use
> --reintegrate option? Because that would work in Stefan's case, so we
> should just do it rather than warn...unless you mean something else.

So, you're thinking more about detecting the very specific kind of scenario that Stefan encountered -- where everything except some leading non-operative revisions had been merged B-to-A, and a reintegrate would give the "correct" result.

In an immediate sense, certainly we could "just do it", special-case this particular scenario (have you thought about what the exact condition should be?) and automatically choose the reintegrate code path in that case.

The trouble is, that doesn't resolve the situation completely. It gives the correct result in terms of the changes merged, but it doesn't straighten out the mergeinfo. That means we have to make sure the rest of the merge subsystem consistently handles this case too: the "sync"-style merge, merge graphing tools that determine what kind of merge was done and what kinds of merge are eligible, and so on. See below.

> I filed an issue for this:
> http://subversion.tigris.org/issues/show_bug.cgi?id=4329 and added a
> test which demonstrates a very simple version of the problem. I set
> issue #4329 a 1.8 blocker. While I think what Stefan is doing is
> somewhat atypical[1], if we are deprecating --reintegrate, then
> automatic merges should handle cases like this. After all, if we
> explicitly use the --reintegrate option the merge works fine. What do
> others think?

I agree it's atypical (or rather unintended) usage, and I also acknowledge that if Stefan used merge like this, then sometimes other people will do too.

However, I don't want us to put any more special cases in the code, especially without putting them into the *design*. If the old "--reintegrate" accepts a missing leading no-op revision range in considering whether "everything" has been merged, that should now be considered a bug. The new rule is that if every revision on B is *recorded* as merged to A, then we can reintegrate from A to B, else we can't.

The main alternative would be to adopt some principle such as "When determining what's been merged and what needs to be merged, we only notice operative revisions, aka non-null changes". I considered that during the design of automatic merge, and was strongly drawn to the idea and started prototyping around it, but that turned out to be a rat-hole involving potentially exhaustive traversal of the whole mergeinfo graph (all branches) to determine whether a given bit of mergeinfo represented an operative or an inoperative change. And the benefits are near zero, because both sync and reintegrate merges were already filling in any gaps in the mergeinfo in the direction of the merge so the situation can only arise if you use manual merges.

The sensible choice in the end was to require that revision ranges are explicitly recorded whether operative or not.

Having found a case that we suspect may occur frequently enough to cause annoyance, I believe the best thing to do is to admit that we don't handle it automatically. Just because the old reintegrate handles this particular case doesn't mean we're making a mistake if the new automatic merge doesn't. This behaviour change is a small adjustment towards overall more consistent merge behaviour, and consistency is key to large-scale usability, much more so than special casing is.

So, what to do exactly? Options seem to be:

0) leave it as it is

1) detect this specific case and warn or error out

2) detect this specific case and do a reintegrate

3) detect the general case and warn or error out

4) detect the general case and merge properly (hah! that's for 1.9)

I would say 0/1/2 are achievable and 3/4 are too difficult, and 2 is a bad idea.

Thoughts?

- Julian
Received on 2013-03-05 17:38:16 CET

This message: [ Message body ]
Next message: Julian Foad: "Re: FSFS format7 and compressed XML bundles"
Previous message: Philip Martin: "Update/switch with an existing tree-conflict for move-update"
In reply to: Paul Burba: "Re: Problems Reintegrating the fsfs-format7 branch (Was: FSFS format7 status and first results)"
Next in thread: Paul Burba: "Re: Problems Reintegrating the fsfs-format7 branch"
Reply: Paul Burba: "Re: Problems Reintegrating the fsfs-format7 branch"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]