Doc string of prepare_subtree_ranges() [was: Merge deleting a file - compare its content]

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Fri, 08 Aug 2008 12:05:26 +0100

On Thu, 2008-08-07 at 19:47 -0400, Paul Burba wrote:
> On Thu, Aug 7, 2008 at 7:34 AM, Julian Foad <julianfoad_at_btopenworld.com> wrote:
[...]
> > We are reverse-merging changes r9:2 from a source branch "A" to a
> > destination branch "A_COPY". The only revisions we could possibly
> > consider merging are those within 9:2 that were operative in the source
> > branch's natural history (the revisions in which any kind of change was
> > committed to this tree). In this test, "svn log" shows that those are
> > A/: r3,4,5,6,7,8; A/D/H: r3,6,7,8; A/D/H/nu: r7,8.
> >
> > I think are saying that our merge-tracking logic deems that, of those
> > possible source revisions, the only ones that it makes sense to
> > reverse-merge are those that the mergeinfo says are present in the
> > target. (For A_COPY/, those are r5,6. For A_COPY/D/H/nu, the
> > intersection is r7,8.)
>
> Exactly right for this example, but it's important to note that things
> aren't always so simple. With the current implementation of merge
> tracking, particularly the nature of svn:mergeinfo as an inheritable
> property, a path can end up with inherited or explicit mergeinfo that
> is *not* part of its natural history - see

I don't understand. I acknowledge that a path can have (inherited or
explicit) mergeinfo that is not part of its natural history; isn't that
in fact how mergeinfo is always supposed to be? I thought #3157#desc8
says it is an aberration that mergeinfo sometimes DOES include bits of
the path's natural history.

So I'm not sure what less-simple case you're thinking of.

> http://subversion.tigris.org/issues/show_bug.cgi?id=3157#desc8,
> particularly the section "Several potential problems remain though".

[...]
> >> First 'A_COPY', this is simple, it needs r5 and r6 reverse merged.
> >>
> >> Next 'A_COPY\D\H', again this is straightforward, needing r5, r6, and
> >> r7 reverse merged.
> >>
> >> Lastly 'A_COPY\D\H\nu', this is where things get slightly more
> >> complicated. On the face of it (going simply by the mergeinfo) this
> >> path needs r5-r8 reverse merged. But 'nu' doesn't exist in the merge
> >> source until r7 so we don't want to reverse merge r5 or r6 into 'nu'
> >> directly as this would break the editor since 'nu' doesn't exist in
> >> the source at those revisions.
> >
> > Sure, that's clear.
> >
> >> I don't want to bury you in #3067 minutiae but now might be a good
> >> time to read the doc string for merge.c:prepare_subtree_ranges().
> >
> > Urg. :-) Comments/review, just of the doc string:
>
> Probably just stating the obvious, but keep in mind that
> prepare_subtree_ranges() is a dedicated helper of
> filter_merged_revisions() and almost all the arguments to the former
> are straight from the latter. So understanding this function in a
> vacuum is quite difficult.

Yes, ack. Didn't mean to be too harsh. I did of course (briefly) read up
on the remaining arguments where they're defined in the parent or
grandparent function. As you are only too well aware, there is a huge
mass of knowledge and understanding of how it all works required to
understand how any part of it works. Where I'm trying to help is by
identifying certain bits of this knowledge that need not be deeply
entangled and should be readily understandable by an outsider in a
"black box description" way.

> > Can you add a sentence that says what the REQUESTED_RANGELIST parameter
> > is for and what its element type is? There are partial descriptions of
> > it in Note1, Note2 and some of the 8 cases, but an overall statement
> > about it would be helpful.
>
> merge.c's private functions use the term "rangelist" quite frequently,
> the assumption being the reader of this code will have looked at
> svn_mergeinfo.h and seen its "Terminology for data structures that
> contain mergeinfo" comment. I'm torn between being concise and being
> pedantic!

Yes. Good to be concise. I understand the notion of a "rangelist" and
it's great that it's documented elsewhere. What I want to know here is
(1) the type of parameter, and especially (2) its purpose (meaning) with
respect to this function's job.

(1) Thanks for your update in r32401. That helps a lot. It makes clear
that it is an output not in-out, allocated by this function not its
caller, of element type (svn_merge_range_t *).

(BTW, how's the attached patch to the descriptions in
"svn_mergeinfo.h"?)

(2) As a caller, I ask myself "What do I get from this function?" As
well as the case-by-case description which provides understanding at a
deeper level, I need to know, "These are the ranges in which a node
exists at PRIMARY_URL" (no) or "These are the ranges in which the object
PRIMARY_URL_at_REVISION1 exists" (no) or "These are the ranges which ..."
WHAT?!!! What's the common feature of this output, that I can make use
of without knowing which case (A, B, ..., G, H) it came from?

The key description is this:

[...] The purpose of this helper is to identify these cases of broken
history and where possible to adjust the requested range
REVISION1:REVISION2 being merged to the subtree so that we don't try to
describe invalid path/revisions to the merge report editor [...]

OK, so maybe, just maybe ...

"Set *REQUESTED_RANGELIST to the list of unbroken history segments of
PARENT_PATH in which changes occurred to this sub-tree."

(Am I getting warmer with each guess?)

> > Maybe something like:
> >
> > Set *REQUESTED_RANGELIST to a newly allocated array of
> > unordered
>
> Did you see a case where they would be unordered? This shouldn't be the case.

The sentence "Then take the intersection of REVISION1:N [...] and add it
to *REQUESTED_RANGELIST" gave me a feeling of uncertainty about it, so I
wrote that so you'd check it. I'm glad it's an ordered list.

> > (svn_merge_range_t) elements representing the
> > revision ranges in which the node PRIMARY_URL_at_REVISION1
> > existed, plus any range in which its parent existed before
> > it did, between REVISION1 and REVISION2. ("Before" being
> > relative to the direction REVISION1 >> REVISION2.) (???)
>
> I'd prefer not to do this at it would be redundant with the case A-H
> listing that is already there no? I did rework the comment in r32401
> in an effort to make it clearer...let me know if it helps.

OK to an extent. But again, the reader does also need to know the common
single definition of the output, otherwise he can't make use of the
output without knowing which case it comes from.

[...]
> > Intro: "Note in *CHILD_DELETED_OR_NONEXISTANT if the subtree doesn't
> > exist, is deleted, or is renamed in the requested range." Say, "or is
> > renamed or replaced". Should this say, "doesn't exist at all in the
> > requested range, or is deleted, renamed or replaced before the end of
> > the range"?
>
> r32401 removed that comment, referring to the 8 cases, rather than
> trying to sum up the behavior.

Again, that's now a more precise specification of what the function
does, but the reader also needs to know what the common meaning is, in
order to be able to make use of that flag.

> > What I'm trying to get at is it's not clear about how the
> > object is allowed to come into existence after the start of the range.
>
> Not entirely sure I follow you here...do you mean how can we be asking
> about PRIMARY_URL's history between REVISION1:REVISION2 when that
> history might not be complete? If so see my next comment. If that's
> not what you meant, can you say it again, but with different words?
> :-P

Well, I'm not too clear either. Something like: you're saying that this
function returns the history segments in which "the object" existed...
but if "the object" didn't exist at REVISION1 (its peg rev), then how
can we claim to return any information about "the object"? Or, if you're
saying that this function returns the history segments in which "this
PATH" existed, or something else, then, well, maybe that story would
hang together better...

Very sorry but though you've written a lot more below I can't take any
more today and am skipping to the end (in a separate reply) :-)

> > Case B: I'm afraid I can't follow the description. "Ranges ... at which
> > PRIMARY_URL exists": meaning all ranges in which any node exists at
> > PRIMARY_URL? If there are two or more, does it matter that these might
> > be history segments of different objects?
>
> Not really, because we are dealing with merge target subtrees here,
> recall the comment (emphasis on subtrees added):
>
> "Since this function is only invoked for *SUBTREES* of the merge
> target, the guarantees afforded by normalize_merge_sources() don't
> apply. Therefore it is possible that PRIMARY_URL_at_REVISION1 and
> PRIMARY_URL_at_REVISION2 don't describe an unbroken line of history."
>
> This might be a good time to look at normalize_merge_sources() and the
> 'MERGEINFO MERGE SOURCE NORMALIZATION' comment at the top of merge.c
> if you haven't already.
>
> Anyway, the fact that subtrees don't adhere the the merge source
> normalization rules is the crux of issue #3067. It might help to
> think of it as being a side effect of how merging -rX:Y to PATH can
> have a different effect on PATH/SUBTREE than can merging -rX:Y
> *directly* to PATH/SUBTREE.
>
> For example, say we have this repos structure:
>
> /
> trunk/
> trunk/file
> branch/file
>
> Where trunk and trunk/file were created in r1
>
> branch was copied from trunk in r2
>
> trunk/file was modified in r3
>
> trunk/file was deleted in r4
>
> trunk/file was added (with identical content to trunk/file_at_2) without
> history in r5
>
> Now we could merge r3 from trunk_URL to branch and branch/file will be
> updated. This is because the merge only cares that trunk_at_2:3 and
> trunk_at_HEAD are ancestrally related so it does the merge. But if we
> merge r3 from trunk_URL/file to branch/file directly, then nothing
> happens, trunk/file_at_2:3 is not ancestrally related to trunk/file_at_HEAD
> so the merge is a no-op.
>
> Now imagine if we made many changes to trunk/file before it was
> replaced and then made many more after. Imagine that subset of the
> changes from before and after had been merged in such a way that
> branch/file had it's own explicit mergeinfo, which might look like:
>
> svn:mergeinfo : /trunk/file:3-4,7,23,25-40,42
>
> This is valid, but realize that /trunk/file:3-4,7 and say
> /trunk/file:23,25-40,42 could be two completely unrelated lines of
> history (let's say trunk/file was replaced without history in r12).
>
> Let's further assume that r5 and r45 are mods to /trunk/file on those
> two different lines of history. What happens if we merge -r3:45
> trunk_URL to branch? Since trunk/file is a subtree of the merge
> target it should attempt to get both r5 and r45 merged into it.
>
> Sorry, that is the long answer as to why it doesn't "matter that these might
> > be history segments of different objects". I swear to you I tried to formulate a more concise answer...but it made no sense :-(
>
> > Then, adding a segment in
> > which it does not exist: is that behaviour specific to a particular
> > requirement that the current caller has in this case, or something that
> > can be explained more generally?
>
> It's to do with how merge.c:drive_merge_report_editor() works. The
> connection is probably not obvious, so here it is:
>
> do_directory_merge() uses this group of helper functions to determine
> what ranges to merge:
>
> populate_remaining_ranges
> calculate_remaining_ranges
> filter_merged_revisions
> prepare_subtree_ranges
>
> Then do_directory_merge() calls drive_merge_report_editor() to
> actually merge those revision.
>
> Sidebar: It's almost certainly possible to streamline that first group
> of helpers, but it won't be easy.
>
> Anyway, look in drive_merge_report_editor() where we loop over the
> CHILDREN_WITH_MERGEINFO, right after the comment "Describe children
> with mergeinfo overlapping this merge operation such that no repeated
> diff is retrieved for them from the repository." This is where issue
> #3067 ultimately manifested itself, those reporter->set_path() calls
> at the bottom of the loop were describing *subtree* path/revisions
> that didn't exist. But notice this bit:
>
> /* If a subtree needs the same range applied as it's nearest parent
> with mergeinfo, then we don't need to describe the subtree
> separately. */
>
> That's why case 'B' sets a subtree's remaining_ranges to include the
> ranges it's nearest parent also has, so we don't try to describe the
> subtree at a revision it doesn't exist.
>
> First, does that make sense at all?
>
> Second, if it does, do you have a suggestion as to how to better
> document it in the code? Maybe refer to that specific part of
> drive_merge_report_editor() in prepare_subtree_ranges?
>
> I know, this code is not clearly organized but I'd like to move in
> that direction as much as possible (and resist the urge to enumerate
> the reasons it got this way).
>
> > "PARENT->REQUESTED_RANGELIST": do you
> > mean PARENT->REMAINING_RANGES?
>
> Yes, that was a typo, fixed.
>
> >
> > Case D says "not replaced within that range". Whereas case H is
> > complemented by case F which covers the broken-line-of-history variant,
> > case D is not complemented by such a case.
>
> The 4 forward merge cases are not exactly analogous to the 4 reverse
> merge cases. This is because of the way the workhorse of this
> function, namely svn_client__repos_location_segments(), works.
> svn_client__repos_location_segments() requires that its START_REV
> argument be younger than it's END_REV arg and PEG_REV must be at least
> as young as START_REV.
>
>
> > Case E: The "###" note: Urg? Sorry, overwhelmed and baffled :-0
>
> Sorry about the format of the comment, this was in place in a 3067
> patch I submitted to dev and was intended to to draw reviewers into
> the part of the 3067 fix I was least confident in...I'm not sure that
> happened. Anyway it should have been cleaned up before going into
> trunk, but got committed as-is. I didn't touch it much in my r32401
> cleanup...I hesitate to remove them as the comments still seem valid.
>
> Ok, since you are probably as sick of reading this as I am of writing
> it :-) I'll say this:
>
> Tweak merge_tests.py 98 'subtree merge source might not exist' to
> raise a failure right after the "# Now test a reverse merge where part
> of the requested range postdates" comment. Check out what the test is
> doing there and see if Case E starts to make any sense.

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org

text/x-patch attachment: mergeinfo-h-1.patch__charset_us-ascii

Received on 2008-08-08 13:05:56 CEST

This message: [ Message body ]
Next message: Stefan Sperling: "Re: Releasing 1.6"
Previous message: Julian Foad: "Re: [PATCH] Merge deleting a file - compare its content"
In reply to: Paul Burba: "Re: [PATCH] Merge deleting a file - compare its content"
Next in thread: Julian Foad: "Re: Doc string of prepare_subtree_ranges() [was: Merge deleting a file - compare its content]"
Reply: Julian Foad: "Re: Doc string of prepare_subtree_ranges() [was: Merge deleting a file - compare its content]"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]