Re: [PATCH] Merge deleting a file - compare its content

From: Paul Burba <ptburba_at_gmail.com>
Date: Thu, 7 Aug 2008 19:47:18 -0400

On Thu, Aug 7, 2008 at 7:34 AM, Julian Foad <julianfoad_at_btopenworld.com> wrote:
> On Wed, 2008-08-06 at 17:53 -0400, Paul Burba wrote:
>> On Wed, Aug 6, 2008 at 7:29 AM, Julian Foad <julianfoad_at_btopenworld.com> wrote:
>> > Here's an improved version. It now compares the properties as well as
>> > the text content. It ignores the "svn:mergeinfo" property because that
>> > property does not contribute to a difference in the file's current state
>> > but only tells how the file reached its current state.
>>
>> +1
>>
>> > Just one test fails: merge_tests.py 77. It looks like the reverse-merge
>> > "svn merge -r9:2 A A_COPY" is omitting r8 from the revision ranges it
>> > thinks it needs to handle. r8 was a text mod to A/D/H/nu. Is this
>> > because Subversion thinks, "there's no point merging r8 (a text mod)
>> > because I can see that I would follow that with deleting the file, so it
>> > would be a waste of effort"?
>>
>> Your guess is pretty much spot on, r8 is being ignored, as far as 'nu'
>> is concerned, because 'nu' will be deleted anyway when r7 is reverse
>> merged. Well, to be accurate, it's not doing this simply because 'nu'
>> will be deleted, rather it is taking advantage of that fact as part of
>> the fix for issue #3067, which tries to avoid describing
>> path/revisions to the editor which don't actually exist.
>>
>> It might help to walk through exactly what is happening in merge_tests.py 77:
>
> Thanks for all this, Paul.
>
>> Right before we reverse merge -r9:2 from 'A' to 'A_COPY', the tree
>> rooted at 'A_COPY' has the following paths explicit mergeinfo:
>>
>> Path Mergeinfo
>> 1) The merge target itself : A_COPY : /A:5-6
>> 2) A subtree with mergeinfo : A_COPY\D\H : /A/D/H:5-7
>> 3) A subtree with mergeinfo : A_COPY\D\H\nu : /A/D/H/nu:5-8
>>
>> (Forgive me if I over-explain but I'm not sure how much you have
>> looked at these parts of the merge code)
>>
>> Each of these three paths are stored in the ubiquitous
>> children_with_mergeinfo array in a svn_client__merge_path_t struct
>> which contains a slew of merge-related info about the path, including
>> the remaining_ranges to be merged.
>>
>> The merge looks at each path (this is all happening in
>> lbisvn_client/merge.c:populate_remaining_ranges() and its many
>> helpers) and determines what revisions of the requested r9:2 should be
>> reverse merged into that path.
>
> OK.
>
>> Note, that since this *is* a reverse merge we only consider revisions
>> which are present in the implicit mergeinfo (i.e. natural history) or
>> its inherited/explicit mergeinfo.
>
> I didn't quite follow this sentence at first, but I think I get it now.
> Your mention of "natural history"

(Kamesh, added you to the cc as you know as much about this topic as
me, so if you have any other comments please have at it!)

The terms "natural history" and "implicit mergeinfo" were coined
during the development of merge tracking (well the latter term
certainly was anyway). Originally tied to the idea that a copy would
create explict mergeinfo describing its natural history or implicit
mergeinfo on the copy destination. Of course this was later seen to
be unnecessary, the repository knows the copies history after all, but
the term stuck (in my head at least!) - see
svn_client__get_history_as_mergeinfo() for example. Also see "Natural
History and Implicit Mergeinfo" in
http://www.collab.net/community/subversion/articles/merge-info.html.

> refers to the general case where we
> might be doing a reverse-merge from our branch's own history, but that's
> not what we're doing in this case.

Correct. In stating the general rule about what is eligible for
reverse merging I just muddied the waters.

> We are reverse-merging changes r9:2 from a source branch "A" to a
> destination branch "A_COPY". The only revisions we could possibly
> consider merging are those within 9:2 that were operative in the source
> branch's natural history (the revisions in which any kind of change was
> committed to this tree). In this test, "svn log" shows that those are
> A/: r3,4,5,6,7,8; A/D/H: r3,6,7,8; A/D/H/nu: r7,8.
>
> I think are saying that our merge-tracking logic deems that, of those
> possible source revisions, the only ones that it makes sense to
> reverse-merge are those that the mergeinfo says are present in the
> target. (For A_COPY/, those are r5,6. For A_COPY/D/H/nu, the
> intersection is r7,8.)

Exactly right for this example, but it's important to note that things
aren't always so simple. With the current implementation of merge
tracking, particularly the nature of svn:mergeinfo as an inheritable
property, a path can end up with inherited or explicit mergeinfo that
is *not* part of its natural history - see
http://subversion.tigris.org/issues/show_bug.cgi?id=3157#desc8,
particularly the section "Several potential problems remain though".

Sidebar: Notice I said it is a problem with the implementation rather
than the design (though the design was mute or vague at best on these
issues). I think this could be fixed, I just wonder at what cost of
effort, code complexity, and performance.

>> First 'A_COPY', this is simple, it needs r5 and r6 reverse merged.
>>
>> Next 'A_COPY\D\H', again this is straightforward, needing r5, r6, and
>> r7 reverse merged.
>>
>> Lastly 'A_COPY\D\H\nu', this is where things get slightly more
>> complicated. On the face of it (going simply by the mergeinfo) this
>> path needs r5-r8 reverse merged. But 'nu' doesn't exist in the merge
>> source until r7 so we don't want to reverse merge r5 or r6 into 'nu'
>> directly as this would break the editor since 'nu' doesn't exist in
>> the source at those revisions.
>
> Sure, that's clear.
>
>> I don't want to bury you in #3067 minutiae but now might be a good
>> time to read the doc string for merge.c:prepare_subtree_ranges().
>
> Urg. :-) Comments/review, just of the doc string:

Probably just stating the obvious, but keep in mind that
prepare_subtree_ranges() is a dedicated helper of
filter_merged_revisions() and almost all the arguments to the former
are straight from the latter. So understanding this function in a
vacuum is quite difficult.

> Can you add a sentence that says what the REQUESTED_RANGELIST parameter
> is for and what its element type is? There are partial descriptions of
> it in Note1, Note2 and some of the 8 cases, but an overall statement
> about it would be helpful.

merge.c's private functions use the term "rangelist" quite frequently,
the assumption being the reader of this code will have looked at
svn_mergeinfo.h and seen its "Terminology for data structures that
contain mergeinfo" comment. I'm torn between being concise and being
pedantic!

> Maybe something like:
>
> Set *REQUESTED_RANGELIST to a newly allocated array of
> unordered

Did you see a case where they would be unordered? This shouldn't be the case.

> (svn_merge_range_t) elements representing the
> revision ranges in which the node PRIMARY_URL_at_REVISION1
> existed, plus any range in which its parent existed before
> it did, between REVISION1 and REVISION2. ("Before" being
> relative to the direction REVISION1 >> REVISION2.) (???)

I'd prefer not to do this at it would be redundant with the case A-H
listing that is already there no? I did rework the comment in r32401
in an effort to make it clearer...let me know if it helps.

> Sorry, that attempt is surely wrong and unhelpful!
>
> Intro: "Filter the requested ranges" -> "Filter the requested range
> REVISION1:REVISION2"?

Done.

> Intro: "Note in *CHILD_DELETED_OR_NONEXISTANT if the subtree doesn't
> exist, is deleted, or is renamed in the requested range." Say, "or is
> renamed or replaced". Should this say, "doesn't exist at all in the
> requested range, or is deleted, renamed or replaced before the end of
> the range"?

r32401 removed that comment, referring to the 8 cases, rather than
trying to sum up the behavior.

> What I'm trying to get at is it's not clear about how the
> object is allowed to come into existence after the start of the range.

Not entirely sure I follow you here...do you mean how can we be asking
about PRIMARY_URL's history between REVISION1:REVISION2 when that
history might not be complete? If so see my next comment. If that's
not what you meant, can you say it again, but with different words?
:-P

> Case B: I'm afraid I can't follow the description. "Ranges ... at which
> PRIMARY_URL exists": meaning all ranges in which any node exists at
> PRIMARY_URL? If there are two or more, does it matter that these might
> be history segments of different objects?

Not really, because we are dealing with merge target subtrees here,
recall the comment (emphasis on subtrees added):

"Since this function is only invoked for *SUBTREES* of the merge
target, the guarantees afforded by normalize_merge_sources() don't
apply. Therefore it is possible that PRIMARY_URL_at_REVISION1 and
PRIMARY_URL_at_REVISION2 don't describe an unbroken line of history."

This might be a good time to look at normalize_merge_sources() and the
'MERGEINFO MERGE SOURCE NORMALIZATION' comment at the top of merge.c
if you haven't already.

Anyway, the fact that subtrees don't adhere the the merge source
normalization rules is the crux of issue #3067. It might help to
think of it as being a side effect of how merging -rX:Y to PATH can
have a different effect on PATH/SUBTREE than can merging -rX:Y
*directly* to PATH/SUBTREE.

For example, say we have this repos structure:

/
trunk/
trunk/file
branch/file

Where trunk and trunk/file were created in r1

branch was copied from trunk in r2

trunk/file was modified in r3

trunk/file was deleted in r4

trunk/file was added (with identical content to trunk/file_at_2) without
history in r5

Now we could merge r3 from trunk_URL to branch and branch/file will be
updated. This is because the merge only cares that trunk_at_2:3 and
trunk_at_HEAD are ancestrally related so it does the merge. But if we
merge r3 from trunk_URL/file to branch/file directly, then nothing
happens, trunk/file_at_2:3 is not ancestrally related to trunk/file_at_HEAD
so the merge is a no-op.

Now imagine if we made many changes to trunk/file before it was
replaced and then made many more after. Imagine that subset of the
changes from before and after had been merged in such a way that
branch/file had it's own explicit mergeinfo, which might look like:

svn:mergeinfo : /trunk/file:3-4,7,23,25-40,42

This is valid, but realize that /trunk/file:3-4,7 and say
/trunk/file:23,25-40,42 could be two completely unrelated lines of
history (let's say trunk/file was replaced without history in r12).

Let's further assume that r5 and r45 are mods to /trunk/file on those
two different lines of history. What happens if we merge -r3:45
trunk_URL to branch? Since trunk/file is a subtree of the merge
target it should attempt to get both r5 and r45 merged into it.

Sorry, that is the long answer as to why it doesn't "matter that these might
> be history segments of different objects". I swear to you I tried to formulate a more concise answer...but it made no sense :-(

> Then, adding a segment in
> which it does not exist: is that behaviour specific to a particular
> requirement that the current caller has in this case, or something that
> can be explained more generally?

It's to do with how merge.c:drive_merge_report_editor() works. The
connection is probably not obvious, so here it is:

do_directory_merge() uses this group of helper functions to determine
what ranges to merge:

populate_remaining_ranges
calculate_remaining_ranges
filter_merged_revisions
prepare_subtree_ranges

Then do_directory_merge() calls drive_merge_report_editor() to
actually merge those revision.

Sidebar: It's almost certainly possible to streamline that first group
of helpers, but it won't be easy.

Anyway, look in drive_merge_report_editor() where we loop over the
CHILDREN_WITH_MERGEINFO, right after the comment "Describe children
with mergeinfo overlapping this merge operation such that no repeated
diff is retrieved for them from the repository." This is where issue
#3067 ultimately manifested itself, those reporter->set_path() calls
at the bottom of the loop were describing *subtree* path/revisions
that didn't exist. But notice this bit:

          /* If a subtree needs the same range applied as it's nearest parent
             with mergeinfo, then we don't need to describe the subtree
             separately. */

That's why case 'B' sets a subtree's remaining_ranges to include the
ranges it's nearest parent also has, so we don't try to describe the
subtree at a revision it doesn't exist.

First, does that make sense at all?

Second, if it does, do you have a suggestion as to how to better
document it in the code? Maybe refer to that specific part of
drive_merge_report_editor() in prepare_subtree_ranges?

I know, this code is not clearly organized but I'd like to move in
that direction as much as possible (and resist the urge to enumerate
the reasons it got this way).

> "PARENT->REQUESTED_RANGELIST": do you
> mean PARENT->REMAINING_RANGES?

Yes, that was a typo, fixed.

>
> Case D says "not replaced within that range". Whereas case H is
> complemented by case F which covers the broken-line-of-history variant,
> case D is not complemented by such a case.

The 4 forward merge cases are not exactly analogous to the 4 reverse
merge cases. This is because of the way the workhorse of this
function, namely svn_client__repos_location_segments(), works.
svn_client__repos_location_segments() requires that its START_REV
argument be younger than it's END_REV arg and PEG_REV must be at least
as young as START_REV.

> Case E: The "###" note: Urg? Sorry, overwhelmed and baffled :-0

Sorry about the format of the comment, this was in place in a 3067
patch I submitted to dev and was intended to to draw reviewers into
the part of the 3067 fix I was least confident in...I'm not sure that
happened. Anyway it should have been cleaned up before going into
trunk, but got committed as-is. I didn't touch it much in my r32401
cleanup...I hesitate to remove them as the comments still seem valid.

Ok, since you are probably as sick of reading this as I am of writing
it :-) I'll say this:

Tweak merge_tests.py 98 'subtree merge source might not exist' to
raise a failure right after the "# Now test a reverse merge where part
of the requested range postdates" comment. Check out what the test is
doing there and see if Case E starts to make any sense.

>> What we have merge_tests.py 77 is described in case 'F' in that doc
>> string. prepare_subtree_ranges() figures out that 'nu' will be
>> deleted by the merge, so doesn't do any filering of the requested
>> ranges, rather it reports to its caller filter_merged_revisions() that
>> 'nu' is going to be deleted. Now see "A little trick" in
>> filter_merged_revisions() -- It sees that the subtree 'nu' will be
>> deleted, so it sets its remaining_ranges to be the same as its nearest
>> parent in the children_with_mergeinfo array, namely 'A_COPY/D/H', so
>> r8 is never reverse merged, which obviously breaks your patch in this
>> case.
>>
>> I'm playing around with some tweaks to filter_merged_revisions() and
>> prepare_subtree_ranges()'s behaviors in case F (and maybe it's forward
>> merge equivalent described in case A) to see if your patch and the
>> issue #3067 fixes can coexist.
>
> Cool. Thanks.
>
> May I commit my patch with this one case not yet working? Or, less
> onerously, do you agree that the merge_file_deleted method is being
> called incorrectly and we should look to fix it as a separate exercise?

I think you can commit it and mark merge_tests.py 77 as XFail (with a
link to this thread in the test's comments). I'm still fairly certain
I can tweak the 3067-related code to work with it (though I still
haven't quite got it working)...

Paul

P.S. Now do you know why everyone was afraid to review the 3067
backport for 1.5.1? :-D

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-08-08 01:47:42 CEST

This message: [ Message body ]
Next message: Martin Furter: "Re: Releasing 1.6"
Previous message: Chia-liang Kao: "Re: [PATCH] allow pre-lock hook to specify token for locking"
In reply to: Julian Foad: "Re: [PATCH] Merge deleting a file - compare its content"
Next in thread: Julian Foad: "Re: [PATCH] Merge deleting a file - compare its content"
Reply: Julian Foad: "Re: [PATCH] Merge deleting a file - compare its content"
Reply: Julian Foad: "Doc string of prepare_subtree_ranges() [was: Merge deleting a file - compare its content]"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]