[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Proposal: don't modify any unrelated mergeinfos during 'svn merge'

From: Bert Huijben <bert_at_vmoo.com>
Date: Wed, 17 Sep 2008 17:49:02 +0200

> -----Original Message-----
> From: Paul Burba [mailto:ptburba_at_gmail.com]
> Sent: woensdag 17 september 2008 16:29
> To: Danil Shopyrin
> Cc: dev_at_subversion.tigris.org
> Subject: Re: Proposal: don't modify any unrelated mergeinfos during
> 'svn merge'
>
> On Tue, Sep 16, 2008 at 8:22 PM, Paul Burba <ptburba_at_gmail.com> wrote:
> > On Mon, Sep 15, 2008 at 11:27 AM, Danil Shopyrin
> <danil_at_visualsvn.com> wrote:
> >>> Before discussing the patch I want make two quick suggestions re
> patch
> >>> submissions in general (I know this isn't technically a patch
> >>> submission, more of an RFC, but you sent a patch):
> >>>
> >>> 1) New functions should have doc strings, even if they are private.
> I
> >>> know there are examples in the code of functions with no or poor
> >>> documentation, heck I'm probably responsible for some, but it makes
> >>> reviewing a lot easier.
> >>>
> >>> 2) Include a log message in the recommended format, see
> >>> http://subversion.tigris.org/hacking#log-messages. Again this
> makes
> >>> reviewing easier and increases the chances that someone will review
> in
> >>> the first place.
> >>>
> >>> Ok, sorry for that bit of nagging, now onto the patch!
> >>
> >> I didn't expect that this patch will be committed (because it's
> naive
> >> and maybe incomplete). That's why there was no of log message. Looks
> >> like omitting log message wasn't a good idea :)
> >>
> >>>> - significantly improve merge performance
> >>> Could you explain how this significantly improves merge
> performance?
> >>> All it does is not update some subtree mergeinfo at the end of a
> >>> merge. Is there some follow-on effect with subsequent merges you
> are
> >>> envisioning?
> >>
> >> Looks like it's my error too. After repeated measures I found that
> >> there is no of *significant* performance improvements.
> >>
> >>> If we pursue this idea we need to do one of the following:
> >>>
> >>> Note: Most of this has already been discussed here
> >>> http://svn.haxx.se/dev/archive-2008-03/0914.shtml, but I don't
> object
> >>> to restating it.
> >>
> >> I can't find any references to 'no-op merges' in the specification
> at
> >> http://subversion.tigris.org/merge-tracking/
> >
> > Hi Danil,
> >
> > No-op merge isn't in the spec, but is defined in the first paragraph
> > of the thread I referenced above:
> > http://svn.haxx.se/dev/archive-2008-03/0914.shtml:
> >
> > "'No-op' here is defined as a merge in which there are no skips and
> no
> > changes are merged in. "No changes" might
> > be because there are no changes in the merge source or it might be
> > because the merge affects only a subtree of the target but that
> > subtree already has it's own explicit mergeinfo covering the change
> > (from a previous subtree merge)."
> >
> > Is that sufficiently clear?
> >
> >> It's hard to investigate and understand this sophisticated piece of
> >> functionality without up-to-date specification. Moreover, some other
> >> aspects such as "empty-mergeinfo-on-wc-to-wc-copy" aren't specified
> in
> >> detail.
> >
> > Agreed, http://subversion.tigris.org/merge-tracking/func-
> spec.html#wc-wc-copy-move
> > in particular is not accurate. The WC->WC copy problem has been
> > discussed in various threads/issues, see the following for some
> > historical context:
> >
> > http://svn.haxx.se/dev/archive-2007-09/0478.shtml
> > http://svn.haxx.se/dev/archive-2006-10/0678.shtml
> > http://subversion.tigris.org/issues/show_bug.cgi?id=2822
> >
> >>> 1) Make the elision code smarter, currently it is rather stupid
> (some
> >>> might say cheap or efficient :-), simply comparing mergeinfo on
> path X
> >>> and it's nearest parent P and seeing if the mergeinfo is identical
> >>> when accounting for the relative path differences between X and P.
> >>> This obviously won't work if we don't don't update all subtrees
> with
> >>> mergeinfo.
> >>>
> >>> 2) Just get rid of elision. This would mean that once we have
> subtree
> >>> mergeinfo, we have it forever. And we might need to special case
> >>> reverse merges so we don't end up with empty mergeinfo when we
> should
> >>> have none. But from what I've seen getting rid of elision might
> not
> >>> be too big of a deal, it hasn't seemed that useful in practice (at
> >>> least in our own repository).
> >>
> >> Am I understand properly, that you think that mergeinfo elision
> isn't
> >> very useful (from your practice)?
> >
> > You understand me perfectly! Now let me add one important caveat:
> "My
> > practice" is limited to what the Subversion community typically
> > encounters while working on the Subversion repository. And from what
> > I can see our typical use cases don't make much use of elision, e.g.
> > we don't typcially do merges directly to subtrees of a branch, and
> > then repeat those same merges at the root of the branch. If we did
> do
> > those things then elision might kick in more.
> >
> >>> There are some other considerations with your patch in place:
> >>>
> >>> For the next two items let's say we have a working copy merge
> target
> >>> named BRANCH_TARGET with no explicit mergeinfo and with a single
> >>> subtree named BRANCH_TARGET/SUBTREE that has some of its own
> explicit
> >>> mergeinfo:
> >>>
> >>> Properties on 'BRANCH_TARGET\SUBTREE':
> >>> svn:mergeinfo
> >>> /TRUNK/SUBTREE:4
> >>>
> >>> A) Individual merges equivalent to a single merge can produce
> >>> different mergeinfo results.
> >>>
> >>> Say we merge ROOT_URL/TRUNK BRANCH_TARGET -r10:20. Let's assume
> that
> >>> the SUBTREE is affected by r15 and r18. If we did a merge like
> this:
> >>>
> >>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r10:20 (SUBTREE unaffected
> >>> so no mergeinfo change)
> >>>
> >>> We'd expect mergeinfo from the source for r11-20 to be added to
> both
> >>> BRANCH_TARGET and BRANCH_TARGET/SUBTREE with or without your patch
> >>> yes? Like this:
> >>>
> >>> Properties on 'BRANCH_TARGET':
> >>> svn:mergeinfo
> >>> /TRUNK:11-20
> >>> Properties on 'BRANCH_TARGET\SUBTREE':
> >>> svn:mergeinfo
> >>> /TRUNK/SUBTREE:4,11-20
> >>>
> >>> But what if we did several individual merges equivalent to the
> single merge:
> >>>
> >>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r10:14 (SUBTREE unaffected
> >>> so no mergeinfo change)
> >>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -c15 (SUBTREE affected
> >>> mergeinfo updated)
> >>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r15:18 (SUBTREE affected
> >>> mergeinfo updated)
> >>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r18:20 (SUBTREE unaffected
> >>> so no mergeinfo change)
> >>>
> >>> The tree changes wrought by these individual merges will be the
> same,
> >>> but the resulting merginfo would be:
> >>>
> >>> Properties on 'BRANCH_TARGET':
> >>> svn:mergeinfo
> >>> /TRUNK:11-20
> >>> Properties on 'BRANCH_TARGET\SUBTREE':
> >>> svn:mergeinfo
> >>> /TRUNK/SUBTREE:4,15-18
> >>>
> >>
> >> Is it anywhere mentioned in the specification, that merges that
> >> produce equal results should produce equal mergeinfo?
> >
> > Not that I can see, but as I mentioned below "I'm not arguing that A
> > and B are big problems..." I was only trying to get you up to speed
> on
> > why things are the way they are today.
> >
> >> As far as I can understand, specification states the opposite:
> >> [[
> >> Q: Are there many different "spellings" of the same merge info?
> >>
> >> A: Yes. Depending on the URLs and target you specify for merges, I
> >> believe it is possible to end up with merge info in different
> places,
> >> or with slightly different revision lines that have the same
> semantic
> >> effect (e.g. info like /trunk:1-9 vs /trunk:1-8\n/trunk/bar:9 when
> >> revision 9 on /trunk only affected /trunk/bar), so you can end up
> with
> >> merge info in different places, even though the semantic result will
> >> be the same in all cases.
> >
> > Unfortunately http://svn.collab.net/repos/svn/trunk/www/merge-
> tracking/design.html
> > hasn't been updated regularly in the past 12 months. A sorry
> > situation to be sure, but be careful how much you rely on those
> > documents. The dicussions on dev are much more relevant.
> >
> >> ]]
> >>
> >> Moreover, there are some signs that original specification doesn't
> >> allow no-op merges to modify mergeinfo:
> >> [[
> >> Each will store the full, complete list of current merged-in
> changes,
> >> as far as it knows...
> >>
> >> Directly merged into the item means changes from any merge that have
> >> affected this path, which includes merges into parents,
> grandparents,
> >> etc., that had some affect on the path.
> >> ]]
> >>
> >> At http://svn.haxx.se/dev/archive-2007-03/0000.shtml you mention
> that
> >> 'the mergeinfo property store(s) the *full, complete* list of
> >> revisions that are directly merged into the item."
> >
> > Could you explain exactly how that is relevant to the current
> > conversation, not trying to be difficult, I just don't follow.
> >
> >> Please excuse me for referencing the specification (usually I'm not
> so
> >> pedantic). But I'm trying to understand what is conceptual and what
> is
> >> just a band-aid.
> >
> > Again I'm afraid I don't follow what you mean here...what "what" do
> you mean?
> >
> >>> B) "Subtree" merges and "Root" merges can produce inconsistent
> mergeinfo.
> >>>
> >>> Hopefully it will be obvious from context what I mean by "subtree"
> and
> >>> "root" merges...
> >>>
> >>> With or without your patch, no-op merges still record mergeinfo on
> the
> >>> merge target. Did you intend to stop this as well?
> >>
> >> I don't know for sure. From the paranoid point of view, there are
> >> should be no mergeinfo changes if nothing is affected. But I
> >> understand that there can be some problems with other features like
> >> --reintegrate.
> >
> > I'm not sure that is the case, at least as far as --reintegrate.
> >
> > Let's say we copy 'trunk'@100 to 'feature_branch' and then
> > periodically merge all available revisions from the former to the
> > latter. If one of these periodic "synch-up merges" is a no-op and we
> > no longer record mergeinfo for no-ops, then we won't commit the
> merge,
> > how can we? There is nothing to commit. So we do some more work on
> > 'feature_branch' and then later do another synch up merge. This time
> > the merge is operative and updates the mergeinfo on 'feature_branch',
> > but there will be no gap in the mergeinfo since the default starting
> > revision of the merge will dictated by the youngest revision in the
> > mergeinfo on 'feature_branch'! Really the only way we'll end up with
> > a "gap" is if we finish our work on 'feature_branch' and do a final
> > synch-up merge which is a no-op. Even if we did end up with other
> > gaps in the mergeinfo on 'feature_branch', say we specified some
> > particular revisions rather than letting merge pick the default
> range,
> > this isn't a problem as long as those gaps represent no-ops on
> > 'trunk'. merge.c:ensure_all_missing_ranges_are_phantoms() in
> > svn_client_merge_reintegrate() takes care of this and allows the
> > reintegrate merge to proceed.
> >
> > Were you thinking of a different problem? Were there any other
> > specific problems with "other features" you had in mind?
> >
> > My opinion is that *if* we stop recording mergeinfo on subtrees that
> > are unaffected by a merge then we shouldn't record it on the merge
> > target proper if the whole merge is a no-op. This seems consistent
> > and I don't recall there being any dire repercussions with doing so.
> >
> >>> Because if we
> >>> record mergeinfo for no-op merges:
> >>>
> >>> When we do a "root" merge...
> >>>
> >>> svn merge ROOT_URL/TRUNK BRANCH_TARGET -r10:14
> >>>
> >>> ...SUBTREE is unaffected so no mergeinfo change there, we get:
> >>>
> >>> Properties on 'BRANCH_TARGET':
> >>> svn:mergeinfo
> >>> /TRUNK:11-14
> >>> Properties on 'BRANCH_TARGET\SUBTREE':
> >>> svn:mergeinfo
> >>> /TRUNK/SUBTREE:4
> >>>
> >>> But is we do a "subtree" merge...
> >>>
> >>> svn merge ROOT_URL/TRUNK/SUBTREE BRANCH_TARGET/SUBTREE -r10:14
> >>>
> >>> ...SUBTREE is unaffected, but SUBTREE is the target so it gets
> updated
> >>> mergeinfo:
> >>>
> >>> Properties on 'BRANCH_TARGET\SUBTREE':
> >>> svn:mergeinfo
> >>> /TRUNK/SUBTREE:4,11-14
> >>>
> >>> I'm not arguing that A and B are big problems, but this very
> >>> inconsistency is one of the reasons we decided to record subtree
> >>> mergeinfo all the time. That, and it also to facilitate "cheap"
> >>> elision.
> >>
> >> I don't think that this is an optimal way. I prefer the standard and
> >> well know pattern when user "doesn't pay for something he doesn't
> >> use". Actually, our team doesn't use cherry picking, local branches,
> >> partially switched working copies etc. All we have is a trunk plus
> >> several plain stabilization/feature branches.
> >
> > And you only merge to the root of those branches? And you don't
> > cherry pick, so you just do merges where you don't specify any
> > revision range?
> >
> >> But after 1.5 was released our life is turned to hell. Now we're
> >> requested to worry about no-op merges regardless of the fact that we
> >> don't need mergeinfo elision at all!
> >
> > I don't doubt your hell, but if you answered yes to the two questions
> > above, how exactly is this hellish? Because some merges change
> > mergeinfo and nothing else? Is it WC to WC copies that have
> generated
> > a lot of subtree mergeinfo and this is the problem?
> >
> >>> So the real question is not so much should we stop recording
> mergeinfo
> >>> on subtrees unaffected by a merge, but rather do we care about
> elision
> >>> and can we live with the inconsistencies in mergeinfo produced in
> >>> cases like A and B.
> >>
> >> I think that mergeinfo elision should be explicit process executed
> by
> >> something like svnadmin mergeinfo-optimize. Just as proposed in the
> >> specification.
> >
> > +1 on some form of "on-demand" elision if we disable elision by
> default.
> >
> > ~~~~~
> >
> > Ok, let's not get too caught up in the above. Returning to what Mark
> > already said the question is can we get consensus on what you
> propose:
> >
> > 1) Don't change subtree mergeinfo when a merge doesn't affect that
> subtree.
> >
> > As I've explained above I think that if we do this, we should also do
> > the following:
> >
> > 2) Disable default elision (since it's effectively useless as-is if
> #1
> > is implemented).
> >
> > 3) Stop recording mergeinfo on the merge target proper if the merge
> is a no-op.
> >
> > 4) Implement on-demand elision (maybe a subcommand of svnadmin, maybe
> > a new switch to merge, whatever that can be decided later).
> >
> > For the record I am +1 on all this and would be happy to get a branch
> > started to do it.
>
> In a discussion with my CollabNet cronies this morning Mark Phippard
> raised the question:
>
> What impact on merge performance should we expect if these changes are
> made?
>
> It's hard to quantify this, but here is what I do know:
>
> A) The more editor drives* we make, the more round trip communications
> we make to the repository, the slower merge performs.
>
> B) The more gaps in a merge target's mergeinfo for the merge source,
> the more editor drives we must make to complete the merge.
>
> C) The more subtrees with explicit mergeinfo the merge target has,
> where those subtrees have differing mergeinfo for the merge source
> than the target, the more gaps we have.
>
> * i.e. the number of calls merge.c:do_directory_merge() makes to
> drive_merge_report_editor().
>
> The question then, is how much more of B and C will these changes
> introduce?
>
> I don't see 'B' being a problem since as I explained above in "but
> there will be no gap in the mergeinfo" there shouldn't be any more or
> less gaps produced by these changes than occur now.
>
> 'C' is a different story though. If we don't record mergeinfo on
> subtrees unaffected by an otherwise operative merge then we are going
> to produce subtrees with differing mergeinfo for the merge source.
> Worse, these differences might never go away**, causing a needless
> round-trip penalty that never goes away. And even worse, these gaps
> could build up over time, making merge slower and slower.
>
> I need to think about this more and probably make a branch to test
> this out, but anyone's thoughts on the matter are appreciated.
>
> ** How common this is depends on how "smart" we can make optional
> elision, but it seems likely that if subtree merges are done (as is
> common in the Subversion repository) then the subtree mergeinfo
> created is there for the long haul.

Is it possible to easily detect if we could apply the same merge higher up in the working copy (e.g. on trunk/ instead of trunk/src/libraries/mylib) with (except for the merge-info) the same result?

In that case it might be possible to just apply the merge higher up the tree, providing better merge info.

I will probably try this method in AnkhSVN to provide a more logical UI, but from that GUI I can access the previously loaded 'svn log' results and I'm not sure if we can do this more generic.

        Bert

>
> Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
> For additional commands, e-mail: dev-help_at_subversion.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-09-17 17:49:19 CEST

This is an archived mail posted to the Subversion Dev mailing list.