Re: Problems Reintegrating the fsfs-format7 branch

From: Paul Burba <ptburba_at_gmail.com>
Date: Thu, 21 Mar 2013 11:25:23 -0400

On Fri, Mar 8, 2013 at 4:17 PM, Paul Burba <ptburba_at_gmail.com> wrote:
> On Tue, Mar 5, 2013 at 11:37 AM, Julian Foad <julianfoad_at_btopenworld.com> wrote:
>> Paul Burba wrote:
>>> On Wed, Feb 27, 2013 at 5:04 PM, Julian Foad wrote:
>>>> Paul Burba wrote:
>>>>> I found the cause of the conflict filled reintegrate merge. The
>>>>> automatic merge code seems to be doing the right thing re Mark's
>>>>> automatic merge above, the problem arises earlier.
>>>>
>>>> Paul, thanks for investigating. While I'm glad to hear the automatic
>>>> merge was not the main root cause of the problem, it would make sense
>>>> for it to detect that merges in the opposite direction have been done
>>>> and at least warn the user.
>>>
>>> Hi Julian,
>>>
>>> What are we going to warn them to do exactly?
>>
>> What I was thinking of here is to detect the most general condition that isn't handled properly. Let me explain this part, although I'm sure you're familiar with it.
>>
>> Whenever the direction of merge is from A to B and the automatic merge code doesn't detect that a reintegrate merge is required, and yet one or more cherry-picked revision ranges and/or subtree merges have been merged from B to A, Subversion currently doesn't notice those merges from B to A, and proceeds to merge "everything" from A to B including those changes that were the result of B-to-A merges. That is what the plain "merge" from A to B has always done -- it has always ignored any mergeinfo describing merges from B to A.
>
> Yes, http://subversion.tigris.org/issues/show_bug.cgi?id=4255 for the curious.
>
>> In the example at hand, there has been a B-to-A merge of everything from B (except some non-operative revisions), and so we get loads of conflicts by ignoring this when we merge the other way. The intelligent user thinks, "This is clearly a reintegrate scenario."
>
> Do they? I'm not very confident that most users will immediately
> think that, especially since we've deprecated the --reintegrate
> option. Even if they do think "I need a reintegrate style merge!" I
> suspect they are more likely to do just that, explicitly using the
> --reintegrate option, rather than re-syncing B->A and then repeating
> the automatic "reintegrate style" merge.
>
>> On the other hand, if there has been a B-to-A merge of not quite everything from B (leaving an operative unmerged part at the beginning or somewhere in the middle), then an old "reintegrate" merge from A to B would error out, whereas an old non-reintegrate merge would just do the merge and wrongly re-merge all the B-to-A changes back to B. The new automatic merge, if it determines (based on the merging of the branch root node) to do a non-reintegrate, currently does the latter.
>>
>> At some point, maybe not for 1.8, if we can detect any such condition, it would be a good to warn the user:
>>
>> A clean merge from A to B is not possible because some but not all
>> changes have been merged from B to A. You're going to see conflicts
>> (unless all your potential conflicts are trivially auto-resolved).
>
> But a reintegrate style merge already does this, if the automatic
> merge could only determine up front that that is the correct type of
> merge.
>
>> We could add some suggestions to the warning:
>>
>> If you are expecting this to be a reintegrate-like merge, check that
>> all changes on B including non-operative revision ranges have been
>> merged to A and recorded in the mergeinfo; try an automatic sync merge
>> from B to A to fill in any gaps.
>>
>> If you are expecting a sync merge, note that there have been some
>> cherry-pick and/or subtree merges from B to A, and this merge will
>> attempt to merge these same changes back to B, which will likely
>> produce conflicts.
>>
>> We could add more specific details of what's been found, and more specific suggestions related to the details, such as the case at hand where all operative revs have been merged B-to-A, if we think it's worth the effort.
>>
>> A basic implementation would not be cheap if there is a significant amout of subtree mergeinfo to check. And I have carefully avoided the issue of whether we should error out or proceed with the merge, as there's no point bike-shedding about that unless we really decide to do this.
>>
>> So let's take a closer look at what we could do for the specific case at hand.
>>
>>
>>> To explicitly use
>>> --reintegrate option? Because that would work in Stefan's case, so we
>>> should just do it rather than warn...unless you mean something else.
>>
>> So, you're thinking more about detecting the very specific kind of scenario that Stefan encountered -- where everything except some leading non-operative revisions had been merged B-to-A, and a reintegrate would give the "correct" result.
>
> Why is "correct" in quotes? How is a reintegrate style merge ever the
> wrong choice in that scenario?
>
> Actually, don't answer that, because I'm thinking of a more general
> case: The missing non-operative revisions may be leading, as in
> Stefan's case, or they may be interspersed with operative revisions
> that have already been merged (including subtree merges where the
> change is only operative in the given subtree -- i.e. would be a no-op
> even if repeated at the branch root). Right now the automatic merge
> code (i.e. merge.c:find_last_merged_location()) obviously doesn't
> handle this, but we already have code that can answer this question:
> svn_client_mergeinfo_log2[1].
>
> Here's an example from our test suite of what I mean:
>
> Run merge_reintegrate_tests.py 13 then revert the local changes the
> test leaves behind. We want to merge A_COPY to A, but there have been
> prior cherrypicks and subtree merges from A to A_COPY:
>
>>svn pl -vR A_COPY
> Properties on 'A_COPY':
> svn:mergeinfo
> /A:6
> Properties on 'A_COPY\B':
> svn:mergeinfo
> /A/B:5
> Properties on 'A_COPY\D\G\rho':
> svn:mergeinfo
> /A/D/G/rho:4
> Properties on 'A_COPY\D\H':
> svn:mergeinfo
> /A/D/H:3,6
>
> However, we can see that these prior cherrypicks and subtree merges
> actually constitute all operative revisions up to r8:
>
>>svn mergeinfo --show-revs eligible -R ^^/A A_COPY
> r9
>
> So an explicit reintegrate merge works fine:
>
>>svn merge ^^/A_COPY A --reintegrate
> --- Merging differences between repository URLs into 'A':
> U A\mu
> --- Recording mergeinfo for merge between repository URLs into 'A':
> U A
>
> But an automatic merge doesn't notice this, wrongly chooses a sync
> style merge and results in spurious changes/conflicts:
>
>>svn revert -Rq .
>
>>svn merge ^^/A_COPY A
> --- Merging r2 through r9 into 'A':
> U A\mu
> C A\D\H\psi
> --- Recording mergeinfo for merge of r2 through r9 into 'A':
> U A
> Summary of conflicts:
> Text conflicts: 1
> Conflict discovered in file 'A\D\H\psi'.
> Select: (p) postpone, (df) diff-full, (e) edit, (m) merge,
> (mc) mine-conflict, (tc) theirs-conflict, (s) show all options: p
>
> So my point is simply that if an explicit --reintegrate merge can
> DTRT, why can't an automatic merge?
>
> [1] calculate_left_hand_side also does effectively the same thing --
> which makes me wonder if we can't use svn_client_mergeinfo_log2 there,
> but that's another issue.
>
>> In an immediate sense, certainly we could "just do it", special-case this particular scenario (have you thought about what the exact condition should be?) and automatically choose the reintegrate code path in that case.
>
> See above "I'm thinking of a more general case".
>
>> The trouble is, that doesn't resolve the situation completely. It gives the correct result in terms of the changes merged, but it doesn't straighten out the mergeinfo.
>
> You mean the mergeinfo on the source?
>
>> That means we have to make sure the rest of the merge subsystem consistently handles this case too: the "sync"-style merge, merge graphing tools that determine what kind of merge was done and what kinds of merge are eligible, and so on. See below.
>
> I'm not sure how this is a problem. If we do a reintegrate style
> merge in the general situation I describe above, how does that break
> subsequent sync style merges? Can you point to an example? in terms
> of determining eligible revisions, 'svn mergeinfo' already handles
> these gaps. Which specific graphing tools are you speaking of?
>
>>> I filed an issue for this:
>>> http://subversion.tigris.org/issues/show_bug.cgi?id=4329 and added a
>>> test which demonstrates a very simple version of the problem. I set
>>> issue #4329 a 1.8 blocker. While I think what Stefan is doing is
>>> somewhat atypical[1], if we are deprecating --reintegrate, then
>>> automatic merges should handle cases like this. After all, if we
>>> explicitly use the --reintegrate option the merge works fine. What do
>>> others think?
>>
>> I agree it's atypical (or rather unintended) usage, and I also acknowledge that if Stefan used merge like this, then sometimes other people will do too.
>>
>> However, I don't want us to put any more special cases in the code, especially without putting them into the *design*. If the old "--reintegrate" accepts a missing leading no-op revision range in considering whether "everything" has been merged, that should now be considered a bug.
>
> Seriously? As long as we allow cherrypicks and subtree merges I think
> we want to support use cases like this...to call it a bug that we
> DTRT, I guess I just don't get it.
>
>> The new rule is that if every revision on B is *recorded* as merged to A, then we can reintegrate from A to B, else we can't.
>
> I never came away with the understanding in all the discussion of
> automatic merges...but maybe I'm the only one.
>
>> The main alternative would be to adopt some principle such as "When determining what's been merged and what needs to be merged, we only notice operative revisions, aka non-null changes". I considered that during the design of automatic merge, and was strongly drawn to the idea and started prototyping around it, but that turned out to be a rat-hole involving potentially exhaustive traversal of the whole mergeinfo graph (all branches) to determine whether a given bit of mergeinfo represented an operative or an inoperative change. And the benefits are near zero, because both sync and reintegrate merges were already filling in any gaps in the mergeinfo in the direction of the merge so the situation can only arise if you use manual merges.
>>
>> The sensible choice in the end was to require that revision ranges are explicitly recorded whether operative or not.
>>
>> Having found a case that we suspect may occur frequently enough to cause annoyance, I believe the best thing to do is to admit that we don't handle it automatically. Just because the old reintegrate handles this particular case doesn't mean we're making a mistake if the new automatic merge doesn't. This behaviour change is a small adjustment towards overall more consistent merge behaviour, and consistency is key to large-scale usability, much more so than special casing is.
>>
>> So, what to do exactly? Options seem to be:
>>
>> 0) leave it as it is
>>
>> 1) detect this specific case and warn or error out
>>
>> 2) detect this specific case and do a reintegrate
>
> The attached patch accomplishes, let's call it "2.5", because it
> handles the more general case I outlined above, but not the case
> represented by issue #4255. Would you mind taking a look?

fyi http://subversion.tigris.org/issues/show_bug.cgi?id=4329#desc7
explains my proposed patch in terms of how it changes the symmetric
merge algorithm.

-- 
Paul T. Burba
CollabNet, Inc. -- www.collab.net -- Enterprise Cloud Development
Skype: ptburba

Received on 2013-03-21 16:25:54 CET

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]