Re: Redundant mergeinfo changes on subtrees with specialized mergeinfo

From: Paul Burba <ptburba_at_gmail.com>
Date: Mon, 5 May 2008 12:28:58 -0400

On Fri, May 2, 2008 at 2:46 PM, Leonardo Fernandes
<leonardo.fernandes_at_outsystems.com> wrote:
> > On Fri, May 2, 2008 at 9:01 AM, Leonardo Fernandes
> > <leonardo.fernandes_at_outsystems.com> wrote:
> > > Hi.
> > >
> > > I have read the thread you pointed to. Thank you.
> > > But it still confuses me why a SCM software forces me to commit no-op
> > revisions, just to get things right in the merge-tracking feature.
> > Hi Leonardo,
> >
> > I'm unclear how you are using the term "no-op revisions" above.
> >
> > Do you mean a merge to a target that is inoperative in the source and
> > hence results in only mergeinfo changes on the target and subtrees
> > with explicit mergeinfo (possibly not even that if it is a repeat
> > merge)? This is how we use the term.
> >
> > Or do you mean a merge that is operative in the source but doesn't
> > touch certain subtrees with explicit mergeifno in the target, but
> > those subtrees still get updated mergeinfo?
> Hi. Thanks for your reply!
> Here I was talking about the following scenario: merging a changeset into wc, but having to commit files in wc/subtree without having their text modified. This is a no-op revision (for wc/subtree, that is), right? Maybe I'm abusing the term "no-op".
> wc (merge-info modified as result of the merge)
> |-- fileChanged (text modified as result of the merge)
> |-- subtree (only merge-info modified as result of the merge)

OK, I understand you now.

> > > Here's my point of view, which is of course as valid as yours, but very
> > different.
> > >
> > > When I merge a no-op changeset into the folder /my/folder, it doesn't
> > surprise me the need to commit /my/folder to record the merge-tracking
> > info. But it does surprise me the fact that I need to commit also
> > /my/folder/my/subtree, just because it has explicit merge-info. I'm not
> > telling it's unacceptable, but it's just confusing. And even more
> > confusing because I cannot explain that behavior to anyone without
> > drilling down into the Subversion implementation, merge-info properties,
> > and subtrees with explicit merge-info.
> > > Imagine a branch with hundreds of subtrees with explicit merge-info.
> > Merging in it would be hell.
> >
> > How common is this use case really? I know some folks do nothing but
> > cherry pick changes to individual files. And many people do all their
> > merges to the root of branches. To get the situation you describe
> > we'd have to do both: Lots of subtree merges to create the 100's of
> > subtrees with explicit mergeinfo. Then merge to the root of the
> > branch.
> >
> > Ok, assuming a lot of people do need to do this (I assume you are one
> > of them) then why is merging to this type of branch "hell"? The merge
> > works no? I assume you don't like the fact that there are (mergeinfo)
> > property changes on some of those subtrees that aren't changed in the
> > merge source?
> Of course, the merge always works. I'm not discussing the correctness of the merge algorithm.
> I'll give you my exact use case. We have a maintenance team which fixes a bug in a file. They commit the file, and propagate the fix to other branches. This is done by:
> * switch the file (or some parent directory of the file) to a new branch
> * merge the revision into the switched subtree
> * commit the subtree
> Why we do this, and not always merge in the root of the working copy? Because it's faster to switch, and faster to merge, and faster to switch to yet another branch.
> Always merging from the trunk would at least spare the trunk from having explicit merge-infos. Why we don't always merge from the trunk? Because the bug might be easily reproducible in an older version, and from that point it will be easier to walk chronologically through all branches (v1, v2, v3, and only then trunk).
> It will be easy to reach the described situation, with hundreds of files with explicit merge-info.
> I hope my use-case is now clear.

It is clear now yes. I'm still not sure it is a very common use case
though. This may be a case where Subversion is not your best
option(?). We try to cover as many use cases as possible, but this
strikes me as one where our "mergeinfo as an inheritable property"
doesn't work very well.

> Now, the problem that arises from this situation, bear with me. Merging something into such a branch would set merge-info in every subtree with explicit merge-info. This would result in lots of files pending for commit, without *relevant* changes.
> Now consider that I want to double-check the merge results. I would have to iterate through all changed files, and diff with the base. My surprise will be that most of them are unchanged, with only merge-info changes, and I will be wasting my time filtering the really interesting ones.
> Not to mention that, all files which received a merge from the maintenance team (in the use case I just described) will always be committed in every future merge. Think about what will happen to the log of such files.
> And I dare to ask you the opposite question of yours. If I don't commit those subtrees, the merge stops working?

Depends what you mean by "the merge". I assume you mean future merges
to the same target? If so, then no; if you revert the mergeinfo
changes on the subtrees unaffected by the merge then future merges to
the same target (or the subtrees) will still work. What won't work in
the current implementation is elision (see below).

> > > In my opinion, it's ok for a no-op merge to set merge-info in the
> > *root* of the merge, or in case of merge-info elision. It's an alternative
> > implementation, which you might consider.

It's just as easy to argue that a merge of -rX:Y to TARGET with
subtree ST1 should set the same mergeinfo on ST1 as a merge directly
to ST1 (i.e. the current behavior) no? This follows the same basic
argument as to why inoperative mergeinfo ranges are set on a merge

> > Sorry, I don't understand what you mean here. In particular, can you
> > give a concrete example of what you mean by "it's ok for a no-op merge
> > to set merge-info...in case of merge-info elision"?
> >
> > I'm not trying to be difficult, but you'll need to spell out the exact
> > rules you are proposing before I can comment much further.
> Please see below for an attempt of clarification.
> > > 1) Multiple Merges vs. Merge with Multiple Ranges Can Result in
> > > Different Mergeinfo
> > >
> > > This would not be true, unless for subtrees. Let's see:
> > >
> > > svn merge -c11 SOURCE TARGET
> > > svn merge -c12 SOURCE TARGET ---> no-op, mergeinfo set *only* in
> > > svn merge -c13 SOURCE TARGET
> > >
> > > The result now would be '/SOURCE:11,12,13'.
> > > In my personal opinion, the result '/SOURCE:11,13' would not be
> > incorrect either, but from what I can see that would cause merge-info
> > elision to fail. That's an acceptable argument.
> > >
> > > 2) It Thwarts Elision
> > >
> > > No it doesn't, because:
> > > - Each tree node will have:
> > > 1. a superset of all operative changes merged into the node;
> > > 2. a subset of all merges ever done to it.
> >
> > Assuming you are talking about proper supersets/subsets, then that
> > doesn't seem to make sense. All "operative changes merged" to a path
> > is typically a proper subset* of "all merges ever done to it". So
> > what you are proposing is 'B':
> >
> > --------------------------------------
> > | |
> > | SET A |
> > | All merges ever done to a path |
> > | |
> > | ------------------------------- |
> > | | | |
> > | | SET B | |
> > | | Superset of all operative | |
> > | | merges and subset of | |
> > | | *all* merges??? | |
> > | | | |
> > | | ----------------------- | |
> > | | | | | |
> > | | | SET C | | |
> > | | | All operative | | |
> > | | | merges done | | |
> > | | | to a path | | |
> > | | | | | |
> > | | ----------------------- | |
> > | | | |
> > | ------------------------------- |
> > | |
> > --------------------------------------
> >
> > (*Yes C can be an improper subset of A, but in practice this is not
> > likely)
> >
> > Again, I'm not trying to be a PITA, I just don't understand exactly
> > what you are proposing :-)
> >
> > Paul
> What I am describing is an alternative method to record the merges. I will try to enunciate it in a clear and formal fashion.
> 1. 'svn merge -rX:Y SOURCE TARGET' should *always* add 'SOURCE:X-Y' in the TARGET merge-info.

Easy enough, this is the current behavior.

> 2. 'svn merge -rX:Y SOURCE TARGET' should add 'SOURCE/subtree:X-Y' to TARGET/subtree if and only if:
> 2.1. TARGET/subtree has explicit merge-info

FWIW there are a lof of other cases where subtrees need to be
considered even if they don't have explicit mergeinfo: Switched
subtrees, subtrees with parents having non-inheritable ranges,
subtrees with missing children (child is switched or absent from the
WC, or parent is a sparse checkout), subtrees with a missing sibling,
subtrees absent from the merge source due to authz restrictions. I
mention these only because things aren't quite so simple...but we can
gloss over these for now I think.

> 2.2. and the merge operation actually changes some file in TARGET/subtree

Without looking in detail that wouldn't be too tricky to implement, but...

> 3. 'svn merge -rX:Y SOURCE TARGET' should also elide any merge-infos in TARGET/subtree if possible

...this would be. Right now the elision logic is fairly simple (and
it hasn't exactly been easy to implment!). In a nutshell it's as

Assume we have a PATH with explicit mergeinfo (the parent) and it has
one subtree with explicit mergeinfo (the child). Assume that RELPATH
is the the path of child relative to the parent. Further let's assume
the working revision for the tree rooted at PATH is uniform and
nothing is switched, we have something like this:

parent: PATH child: PATH + RELPATH
mergeinfo mergeinfo
-------------- ------------------------

If RANGE1 == RANGE2 then elision occurs.

Now with your suggested approach RANGE1 could differ from RANGE2 but
elision still might occur, say we had:

parent: branch child: branch/child
mergeinfo mergeinfo
-------------- ------------------------
/trunk : 5-20 /trunk/child: 6,8-10,12,17

Now maybe the mergeinfo on 'branch/child' is equivalent to that on
'branch' because r5,11,13-16,18-20 are inoperative in 'trunk/child'.
But maybe some or all of these revions *are* operative in
'trunk/child' and were reversed merged out of 'branch/child'. We
can't know without asking the server about each missing revision
*individually* to see if it affects 'branch/child'. Why individually?
 Because 'trunk/child_at_5' to 'trunk/child_at_20' might not represent the
start and end points of a contiguous line of history. If none of the
missing revisions affect 'branch/child' then elision can occur.
Problem is, in a situation where there are hundreds of subtrees this
is probably going to cause a *severe* performance hit.

I'm also a bit wary of subtrees with explicit mergeinfo which have
thier own subtrees with explicit mergeinfo...I can't articulate
anything quite yet, but I have a bad feeling :-\

> Of course, subtree could be N levels deep (N >= 1), and could be a file or folder.
> Because of (1.) all merges done are recorded somewhere, even if they are no-operative merges.
> Because of (2.) we would not need to commit files which weren't changed in a merge, just because the file had explicit merge-info. This is the main point of this discussion.
> And finally because of (1.), (3.) is possible.
> This suggestion solves the problems described in http://subversion.tigris.org/servlets/ReadMsg?listName=dev&msgNo=136570.
> I hope I was clearer this time.
> Leonardo Fernandes

Thanks Leonardo, you have been very clear. I understand your problem
and am not without sympathy for it. However I am very reluctant to
pursue your proposed solution because:

1) It will decrease merge performance significantly in the very case
it is trying to address.

2) The amount of work to implment it is non-trivial.

3) Mergeinfo inheritance and elision is already difficult to explain
to the average user, this would make it even more difficult.

This is not to say thay it can't or shouldn't be done, if the
performance problems could be addressed the rest is a SMOP. If you
want to try your hand at some Subversion development I'd be glad to
help with code reviews. But I don't have the time to do it myself.


