[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Proposal: don't modify any unrelated mergeinfos during 'svn merge'

From: Paul Burba <ptburba_at_gmail.com>
Date: Wed, 17 Sep 2008 10:29:11 -0400

On Tue, Sep 16, 2008 at 8:22 PM, Paul Burba <ptburba_at_gmail.com> wrote:
> On Mon, Sep 15, 2008 at 11:27 AM, Danil Shopyrin <danil_at_visualsvn.com> wrote:
>>> Before discussing the patch I want make two quick suggestions re patch
>>> submissions in general (I know this isn't technically a patch
>>> submission, more of an RFC, but you sent a patch):
>>>
>>> 1) New functions should have doc strings, even if they are private. I
>>> know there are examples in the code of functions with no or poor
>>> documentation, heck I'm probably responsible for some, but it makes
>>> reviewing a lot easier.
>>>
>>> 2) Include a log message in the recommended format, see
>>> http://subversion.tigris.org/hacking#log-messages. Again this makes
>>> reviewing easier and increases the chances that someone will review in
>>> the first place.
>>>
>>> Ok, sorry for that bit of nagging, now onto the patch!
>>
>> I didn't expect that this patch will be committed (because it's naive
>> and maybe incomplete). That's why there was no of log message. Looks
>> like omitting log message wasn't a good idea :)
>>
>>>> - significantly improve merge performance
>>> Could you explain how this significantly improves merge performance?
>>> All it does is not update some subtree mergeinfo at the end of a
>>> merge. Is there some follow-on effect with subsequent merges you are
>>> envisioning?
>>
>> Looks like it's my error too. After repeated measures I found that
>> there is no of *significant* performance improvements.
>>
>>> If we pursue this idea we need to do one of the following:
>>>
>>> Note: Most of this has already been discussed here
>>> http://svn.haxx.se/dev/archive-2008-03/0914.shtml, but I don't object
>>> to restating it.
>>
>> I can't find any references to 'no-op merges' in the specification at
>> http://subversion.tigris.org/merge-tracking/
>
> Hi Danil,
>
> No-op merge isn't in the spec, but is defined in the first paragraph
> of the thread I referenced above:
> http://svn.haxx.se/dev/archive-2008-03/0914.shtml:
>
> "'No-op' here is defined as a merge in which there are no skips and no
> changes are merged in. "No changes" might
> be because there are no changes in the merge source or it might be
> because the merge affects only a subtree of the target but that
> subtree already has it's own explicit mergeinfo covering the change
> (from a previous subtree merge)."
>
> Is that sufficiently clear?
>
>> It's hard to investigate and understand this sophisticated piece of
>> functionality without up-to-date specification. Moreover, some other
>> aspects such as "empty-mergeinfo-on-wc-to-wc-copy" aren't specified in
>> detail.
>
> Agreed, http://subversion.tigris.org/merge-tracking/func-spec.html#wc-wc-copy-move
> in particular is not accurate. The WC->WC copy problem has been
> discussed in various threads/issues, see the following for some
> historical context:
>
> http://svn.haxx.se/dev/archive-2007-09/0478.shtml
> http://svn.haxx.se/dev/archive-2006-10/0678.shtml
> http://subversion.tigris.org/issues/show_bug.cgi?id=2822
>
>>> 1) Make the elision code smarter, currently it is rather stupid (some
>>> might say cheap or efficient :-), simply comparing mergeinfo on path X
>>> and it's nearest parent P and seeing if the mergeinfo is identical
>>> when accounting for the relative path differences between X and P.
>>> This obviously won't work if we don't don't update all subtrees with
>>> mergeinfo.
>>>
>>> 2) Just get rid of elision. This would mean that once we have subtree
>>> mergeinfo, we have it forever. And we might need to special case
>>> reverse merges so we don't end up with empty mergeinfo when we should
>>> have none. But from what I've seen getting rid of elision might not
>>> be too big of a deal, it hasn't seemed that useful in practice (at
>>> least in our own repository).
>>
>> Am I understand properly, that you think that mergeinfo elision isn't
>> very useful (from your practice)?
>
> You understand me perfectly! Now let me add one important caveat: "My
> practice" is limited to what the Subversion community typically
> encounters while working on the Subversion repository. And from what
> I can see our typical use cases don't make much use of elision, e.g.
> we don't typcially do merges directly to subtrees of a branch, and
> then repeat those same merges at the root of the branch. If we did do
> those things then elision might kick in more.
>
>>> There are some other considerations with your patch in place:
>>>
>>> For the next two items let's say we have a working copy merge target
>>> named BRANCH_TARGET with no explicit mergeinfo and with a single
>>> subtree named BRANCH_TARGET/SUBTREE that has some of its own explicit
>>> mergeinfo:
>>>
>>> Properties on 'BRANCH_TARGET\SUBTREE':
>>> svn:mergeinfo
>>> /TRUNK/SUBTREE:4
>>>
>>> A) Individual merges equivalent to a single merge can produce
>>> different mergeinfo results.
>>>
>>> Say we merge ROOT_URL/TRUNK BRANCH_TARGET -r10:20. Let's assume that
>>> the SUBTREE is affected by r15 and r18. If we did a merge like this:
>>>
>>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r10:20 (SUBTREE unaffected
>>> so no mergeinfo change)
>>>
>>> We'd expect mergeinfo from the source for r11-20 to be added to both
>>> BRANCH_TARGET and BRANCH_TARGET/SUBTREE with or without your patch
>>> yes? Like this:
>>>
>>> Properties on 'BRANCH_TARGET':
>>> svn:mergeinfo
>>> /TRUNK:11-20
>>> Properties on 'BRANCH_TARGET\SUBTREE':
>>> svn:mergeinfo
>>> /TRUNK/SUBTREE:4,11-20
>>>
>>> But what if we did several individual merges equivalent to the single merge:
>>>
>>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r10:14 (SUBTREE unaffected
>>> so no mergeinfo change)
>>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -c15 (SUBTREE affected
>>> mergeinfo updated)
>>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r15:18 (SUBTREE affected
>>> mergeinfo updated)
>>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r18:20 (SUBTREE unaffected
>>> so no mergeinfo change)
>>>
>>> The tree changes wrought by these individual merges will be the same,
>>> but the resulting merginfo would be:
>>>
>>> Properties on 'BRANCH_TARGET':
>>> svn:mergeinfo
>>> /TRUNK:11-20
>>> Properties on 'BRANCH_TARGET\SUBTREE':
>>> svn:mergeinfo
>>> /TRUNK/SUBTREE:4,15-18
>>>
>>
>> Is it anywhere mentioned in the specification, that merges that
>> produce equal results should produce equal mergeinfo?
>
> Not that I can see, but as I mentioned below "I'm not arguing that A
> and B are big problems..." I was only trying to get you up to speed on
> why things are the way they are today.
>
>> As far as I can understand, specification states the opposite:
>> [[
>> Q: Are there many different "spellings" of the same merge info?
>>
>> A: Yes. Depending on the URLs and target you specify for merges, I
>> believe it is possible to end up with merge info in different places,
>> or with slightly different revision lines that have the same semantic
>> effect (e.g. info like /trunk:1-9 vs /trunk:1-8\n/trunk/bar:9 when
>> revision 9 on /trunk only affected /trunk/bar), so you can end up with
>> merge info in different places, even though the semantic result will
>> be the same in all cases.
>
> Unfortunately http://svn.collab.net/repos/svn/trunk/www/merge-tracking/design.html
> hasn't been updated regularly in the past 12 months. A sorry
> situation to be sure, but be careful how much you rely on those
> documents. The dicussions on dev are much more relevant.
>
>> ]]
>>
>> Moreover, there are some signs that original specification doesn't
>> allow no-op merges to modify mergeinfo:
>> [[
>> Each will store the full, complete list of current merged-in changes,
>> as far as it knows...
>>
>> Directly merged into the item means changes from any merge that have
>> affected this path, which includes merges into parents, grandparents,
>> etc., that had some affect on the path.
>> ]]
>>
>> At http://svn.haxx.se/dev/archive-2007-03/0000.shtml you mention that
>> 'the mergeinfo property store(s) the *full, complete* list of
>> revisions that are directly merged into the item."
>
> Could you explain exactly how that is relevant to the current
> conversation, not trying to be difficult, I just don't follow.
>
>> Please excuse me for referencing the specification (usually I'm not so
>> pedantic). But I'm trying to understand what is conceptual and what is
>> just a band-aid.
>
> Again I'm afraid I don't follow what you mean here...what "what" do you mean?
>
>>> B) "Subtree" merges and "Root" merges can produce inconsistent mergeinfo.
>>>
>>> Hopefully it will be obvious from context what I mean by "subtree" and
>>> "root" merges...
>>>
>>> With or without your patch, no-op merges still record mergeinfo on the
>>> merge target. Did you intend to stop this as well?
>>
>> I don't know for sure. From the paranoid point of view, there are
>> should be no mergeinfo changes if nothing is affected. But I
>> understand that there can be some problems with other features like
>> --reintegrate.
>
> I'm not sure that is the case, at least as far as --reintegrate.
>
> Let's say we copy 'trunk'@100 to 'feature_branch' and then
> periodically merge all available revisions from the former to the
> latter. If one of these periodic "synch-up merges" is a no-op and we
> no longer record mergeinfo for no-ops, then we won't commit the merge,
> how can we? There is nothing to commit. So we do some more work on
> 'feature_branch' and then later do another synch up merge. This time
> the merge is operative and updates the mergeinfo on 'feature_branch',
> but there will be no gap in the mergeinfo since the default starting
> revision of the merge will dictated by the youngest revision in the
> mergeinfo on 'feature_branch'! Really the only way we'll end up with
> a "gap" is if we finish our work on 'feature_branch' and do a final
> synch-up merge which is a no-op. Even if we did end up with other
> gaps in the mergeinfo on 'feature_branch', say we specified some
> particular revisions rather than letting merge pick the default range,
> this isn't a problem as long as those gaps represent no-ops on
> 'trunk'. merge.c:ensure_all_missing_ranges_are_phantoms() in
> svn_client_merge_reintegrate() takes care of this and allows the
> reintegrate merge to proceed.
>
> Were you thinking of a different problem? Were there any other
> specific problems with "other features" you had in mind?
>
> My opinion is that *if* we stop recording mergeinfo on subtrees that
> are unaffected by a merge then we shouldn't record it on the merge
> target proper if the whole merge is a no-op. This seems consistent
> and I don't recall there being any dire repercussions with doing so.
>
>>> Because if we
>>> record mergeinfo for no-op merges:
>>>
>>> When we do a "root" merge...
>>>
>>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r10:14
>>>
>>> ...SUBTREE is unaffected so no mergeinfo change there, we get:
>>>
>>> Properties on 'BRANCH_TARGET':
>>> svn:mergeinfo
>>> /TRUNK:11-14
>>> Properties on 'BRANCH_TARGET\SUBTREE':
>>> svn:mergeinfo
>>> /TRUNK/SUBTREE:4
>>>
>>> But is we do a "subtree" merge...
>>>
>>> svn merge ROOT_URL/TRUNK/SUBTREE BRANCH_TARGET/SUBTREE -r10:14
>>>
>>> ...SUBTREE is unaffected, but SUBTREE is the target so it gets updated
>>> mergeinfo:
>>>
>>> Properties on 'BRANCH_TARGET\SUBTREE':
>>> svn:mergeinfo
>>> /TRUNK/SUBTREE:4,11-14
>>>
>>> I'm not arguing that A and B are big problems, but this very
>>> inconsistency is one of the reasons we decided to record subtree
>>> mergeinfo all the time. That, and it also to facilitate "cheap"
>>> elision.
>>
>> I don't think that this is an optimal way. I prefer the standard and
>> well know pattern when user "doesn't pay for something he doesn't
>> use". Actually, our team doesn't use cherry picking, local branches,
>> partially switched working copies etc. All we have is a trunk plus
>> several plain stabilization/feature branches.
>
> And you only merge to the root of those branches? And you don't
> cherry pick, so you just do merges where you don't specify any
> revision range?
>
>> But after 1.5 was released our life is turned to hell. Now we're
>> requested to worry about no-op merges regardless of the fact that we
>> don't need mergeinfo elision at all!
>
> I don't doubt your hell, but if you answered yes to the two questions
> above, how exactly is this hellish? Because some merges change
> mergeinfo and nothing else? Is it WC to WC copies that have generated
> a lot of subtree mergeinfo and this is the problem?
>
>>> So the real question is not so much should we stop recording mergeinfo
>>> on subtrees unaffected by a merge, but rather do we care about elision
>>> and can we live with the inconsistencies in mergeinfo produced in
>>> cases like A and B.
>>
>> I think that mergeinfo elision should be explicit process executed by
>> something like svnadmin mergeinfo-optimize. Just as proposed in the
>> specification.
>
> +1 on some form of "on-demand" elision if we disable elision by default.
>
> ~~~~~
>
> Ok, let's not get too caught up in the above. Returning to what Mark
> already said the question is can we get consensus on what you propose:
>
> 1) Don't change subtree mergeinfo when a merge doesn't affect that subtree.
>
> As I've explained above I think that if we do this, we should also do
> the following:
>
> 2) Disable default elision (since it's effectively useless as-is if #1
> is implemented).
>
> 3) Stop recording mergeinfo on the merge target proper if the merge is a no-op.
>
> 4) Implement on-demand elision (maybe a subcommand of svnadmin, maybe
> a new switch to merge, whatever that can be decided later).
>
> For the record I am +1 on all this and would be happy to get a branch
> started to do it.

In a discussion with my CollabNet cronies this morning Mark Phippard
raised the question:

What impact on merge performance should we expect if these changes are made?

It's hard to quantify this, but here is what I do know:

A) The more editor drives* we make, the more round trip communications
we make to the repository, the slower merge performs.

B) The more gaps in a merge target's mergeinfo for the merge source,
the more editor drives we must make to complete the merge.

C) The more subtrees with explicit mergeinfo the merge target has,
where those subtrees have differing mergeinfo for the merge source
than the target, the more gaps we have.

* i.e. the number of calls merge.c:do_directory_merge() makes to
drive_merge_report_editor().

The question then, is how much more of B and C will these changes introduce?

I don't see 'B' being a problem since as I explained above in "but
there will be no gap in the mergeinfo" there shouldn't be any more or
less gaps produced by these changes than occur now.

'C' is a different story though. If we don't record mergeinfo on
subtrees unaffected by an otherwise operative merge then we are going
to produce subtrees with differing mergeinfo for the merge source.
Worse, these differences might never go away**, causing a needless
round-trip penalty that never goes away. And even worse, these gaps
could build up over time, making merge slower and slower.

I need to think about this more and probably make a branch to test
this out, but anyone's thoughts on the matter are appreciated.

** How common this is depends on how "smart" we can make optional
elision, but it seems likely that if subtree merges are done (as is
common in the Subversion repository) then the subtree mergeinfo
created is there for the long haul.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-09-17 16:29:27 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.