[merge tracking] Implied merge info, and subsequent discussion with Michael Haggerty

From: Daniel L. Rall <dlr_at_collab.net>
Date: 2006-10-25 01:37:54 CEST

Dan, while working with Madan on his recent 'copy' patch to handle
implied merge info for copy source paths, we've run into some
problems, illustrated by 'merge_tests.py 12' and the attached patch.
An abbreviated example follows:

# Copy /A (at r1), creating a new path with merge info of /A:1 for r2
svn cp file:///repos-path/A file:///repos-path/copy-of-A

# Notice that even though A has an implied merge info of /A:1-3, that
# diff is unaware
svn diff file:///repos-path/copy-of-A_at_2 file:///repos-path/A_at_3
Property changes on: .
___________________________________________________________________
Name: svn:mergeinfo
- /A:1

# Merge reports ' G', a property change for "svn:mergeinfo"
svn merge file:///repos-path/copy-of-A_at_2 file:///repos-path/A_at_3 A
G A

# However, the merge didn't actually change the merge info --
# 'propget' still shows nothing
svn pg svn:mergeinfo A

If we consider implied merge info when doing the property diff in
libsvn_repos/delta.c (where it calls svn_prop_diffs()), we might send
an inappropriate prop delta back to the WC, resulting in us not
recording explicit merge info when we should be. If we discard merge
info for our own path within libsvn_wc, we might also toss explicit
merge info which we should record (possibly important when doing
bi-directional merging back into the branch where the merge info
originally came from).

When I described this problem on IRC, Michael Haggerty had some
thoughts on the design (and its impact on the above case). The
beginning of our discussion went off the end of my scroll buffer
(possibly logged on the net somewhere), but I thought you might be
interested anyways. He seems to really be in favor of a more
ancestry-based approach, which could certainly help with some of the
problems we're running into. (If he can find the time, Michael
indicated that he'd try and summarize the discussion below.)

dlr| mhagger: We currently have no way to differentiate between the "original" change and the result of a merge.
dlr| mhagger: In fact, the delta in the revision created by the commit of a merge may differ significantly from the "original" change.
mhagger| Certainly, but logically it should be the same delta
mhagger| (Ideally one should not add original content when merging a change.)
dlr| madan: The "correct mergeinfo" for what?
mhagger| As soon as you start merging between three branches, things get very messy
mhagger| In my above scenario, what is to prevent me from merging B:3 into C and then later merging A:5 into C?
dlr| mhagger: Sometimes it's very necessary to add original content before committing the result of a merge. Consider a feature branch which requires a bunch of extra glue code to be written after merging in the difference between its branch point from the mainline, and the latest code in the mainline (e.g. due to API changes).
mhagger| (B:3 and A:5 being logically the same change)
dlr| If I've further forked the feature branch before merging from the mainline, I need to differentiate between the "original" change to the mainline, and the extra changes to the feature branch.
mhagger| I understand that. But the bottom line is that I never want to merge the original version of the new feature and also the feature+glue code into a third branch.
dlr| mhagger: Nothing prevents that -- the merge would be attempted, but the result should be a no-op (as today).
mhagger| Probably I would consider the two changes to be different versions of the same "feature" and I would use my knowledge of the code to pick the more appropriate version to merge in a given situation
mhagger| What makes it a no-op? Please explain.
dlr| mhagger: With B:3 and A:5 being the same change, the merge of B:3 will succeed (changing the WC), then the merge of A:5 will also succeed (not actually changing the WC).
mhagger| Not if any "glue code" was needed when creating A:5
mhagger| I though the whole point of merge tracking is to avoid the "repeated merge" problem?
dlr| yes, that is a major goal
mhagger| This is essentially a repeated merge, because B:3 and A:5 are logically the same change to the code (though textually they might differ)
mhagger| Naively (I'm definitely not a merge guru), it seems to me that an "original revision" (i.e., a code change created ab initio by a developer) should get an identity.
mhagger| As far as I'm concerned, it can be named B:3 in the case above because the code was originally added in that revision.
mhagger| Whenever a revision is merged to another branch, its UID should get stored in the other branch's merge info
dlr| 3 is effectively a peg revision for B -- we're using it lieu of an object identifier, since we don't expose those outside of the FS
mhagger| (ideally, associated with a particular revision on the new branch, like A:5 -> B:3)
mhagger| Then, if either B:3 or A:5 is merged into branch C creating C:10, then the UID B:3 should be associated with C:10.
dlr| it's that association that we're not doing (nor have yet planned to do)
mhagger| That way, if somebody decides to merge C:10 back to A, it can be determined (without looking at the deltas at all) that it should be a noop
mhagger| (Because C:10 == B:3 and B:3 is already merged into A.)
mhagger| But maybe this all assumes an unrealistic amount of discipline on the part of people doing merges.
dlr| mhagger: I don't disagree. How do we implement this without 'merge' being a first-class operation in Subversion?
mhagger| What does "first-class" mean to you?
dlr| Right now, you cannot distinguish between the result of a 'merge' and a typical edit when performing a commit.
mhagger| But you will be adding properties that allow you to make the distinction, right?
mhagger| The modification of a merge property implies that the commit was the result of a merge
dlr| The properties can be manually manipulated by the end user without a 'merge' actually occurring.
dlr| er, property
mhagger| Sure. The user can break the merge system if they want. So what?
dlr| (That may not be important, though -- they're both merges.)
mhagger| BTW, there is an interesting problem with the scheme I described...
dlr| mhagger: It is an important that the user be able to mark a change as merged without actually performing the merge.
dlr| svnmerge.py merge --record-only
mhagger| Suppose B:10-12 are merged into A, creating A:15
mhagger| and B:10 merged into C creating C:18
mhagger| Now if I try to merge A:15 into C, I would have to get an intrinsic conflict (even before looking at the deltas)
mhagger| because C already contains part, but not all, of A:15 (namely B:10 but not B:11-12)
mhagger| the merge code would have to refuse to do such a thing.
mhagger| I'm sure this is all old-hat changeset-oriented technology that we are discusing
dlr| Why shouldn't it simply skip merge of B:10?
mhagger| Because A:15 is a single commit containing three revisions mashed together.
dlr| okay
mhagger| there is no way to deduce B:11-12 based only on the contents of branch A
dlr| today on the merge-tracking branch, it will apply A:15, and Subversion's merge logic will probably not result in a conflict
mhagger| and I don't think you would want to go digging about in branch B to pick them apart
dlr| when applying the B:10 portions of the A:15 change
dlr| indeed, it would not be correct to dig into branch B, since A:15 might have changed the form of those logical changes
mhagger| I think that svnmerge.py (and presumably the new merge-tracking code) only really work in very restricted merging scenarios
dlr| Yes, if you start merging in all directions, they don't help you.
dlr| But, you're no worse off than you are today.
mhagger| (probably only in the case that there are no cycles in the graph of branches between which revisions are merged
peterS| mhagger: in other words, tree structure. that would be enough for me.
madan| mergeinfo is special, mergeinfo is special
dlr| mhagger: yes
mhagger| Yes, tree structure (but probably allowing bidirectional merges along any branch of the tree)
dlr| mhagger: The assumption is that this accomodates a large percentage of the typical use cases.
mhagger| dlr: There is a sense in which we will be worse off than today (even after the equivalent of all of svnmerge.py is implemented in svn):
peterS| dlr: some people do want a workflow of merging between branches without going back through the trunk. I don't think I will want that, though
mhagger| svn will have blessed an "official" merge infrastructure and it will be very difficult to change it again
dlr| mhagger: With merge a first-class operation and object identities, the ancentry-set approach that you describe would be preferable. I think this will become easier to implement if we have some degree of distributed repositories.
mhagger| svnmerge.py in its "contrib" home can be seen as experimental, one-approach-of-many-possible
mhagger| Would it make sense to prohibit merges that are not tree-like?
mhagger| probably not
madan| mhagger, like brane says, "merge <> copy"
madan| that would save us a lot of pain
madan| but we dont have it as of now
dlr| madan: that's branch != copy
madan| okay okay ;)
dlr| mhagger: The true ancestry-set approach you describe is semantically different than what we've been implementing.
mhagger| Yes, I understand that
madan| didnt BASIC use <> for != anyways!
peterS| yes
dlr| mhagger: Our compatibility guidelines allow us to make the jump from what we've been implementing to what you describe in a 2.0 release.
peterS| and perl uses <> for input from the default file handle
madan| ah
mhagger| madan: Even python allows <> for != (though deprecated)
dlr| SQL too
dlr| or at least, most implementations
madan| dlr : would it be okay, if I submit a patch thats the diff between two files in my local directories?
dlr| mhagger: You seem concerned about releasing with the more simplistic semantics, then possibly changing to something more complex.
mhagger| dlr: Yes, but imagine the screaming if we told people that they would lose their merge information if they upgrade!
dlr| mhagger: Or more complete, rather.
dlr| mhagger: Indeed! You don't think we could migrate the merge info?
mhagger| I should probably just keep my mouth shut because I don't have time to STFU and write some code
dlr| madan: yes, that's fine
madan| dlr : great! thanks
dlr| mhagger: No, this is good discussion!
mhagger| dlr: No, there wouldn't be enough information in the current data to deduce the more complete data
dlr| mhagger: brane described the same model at the Summit on Wednesday
peterS| mhagger: _not having_ data to migrate is not the same thing as _not being able_ to migrate the data you have
dlr| heh
mhagger| They would have data. The question is whether their data could be preserved as a kind of incomplete full-merge-info or whether it would be so incomplete as to have to be discarded.
dlr| and you think the latter?
mhagger| I haven't thought much about it, but I would definitely worry that we can't use the old merge data anymore
mhagger| until we have proof to the contrary
*| peterS calls to mind some statement about enemies, perfect, and good
mhagger| peterS: ACK. It just seemed that dlr is struggling with problems that are intrinsic in the model. Sometimes the correct solution is easier than an 80% solution.
mhagger| (and all of the distributed SCM people will laugh at us on the playground :-) )
dlr| mhagger: Yes, I would be much happier if we weren't using a property, and were instead using something closer to an ancestry-set approach.
mhagger| Would it really be so much more work? (I have no idea)
dlr| DannyB spent a lot longer thinking about it than I.
dlr| I tend to think yes, it would be a lot more work.
mhagger| Is it conceptually more difficult, or would it mostly be difficult to get adequate performance?
mhagger| (And space usage, etc)
dlr| I don't see it mapping well onto the current state of Subversion. 'merge' would need to become a first-class operation, requiring changes from WC -> RA -> REPOS -> FS.
mhagger| Conceptually it seems quite trivial: (1) Each original commit gets a UID. (2) Each commit resulting from a merge records the UIDs of the revisions that were merged into it. (3) Merges skip over diffs corresponding to UIDs that are already present in the destination (with the possibility of unresolvable conflicts as discussed above)
dlr| The schema change is a real kicker.
dlr| I tend to think that performance wouldn't be any worse than it would already be on the merge-tracking branch.
mhagger| Very *very* naively: each merge revision gets a revprop "svn:merge-origin"
mhagger| These revprops are either accumulated over time or grepped through when needed.
mhagger| Non-merge ("original") revisions could get some property that indicates that they are original, or that could just be the default
mhagger| Obviously these origin properties would sometimes have to have the granularity of a single file, so there might have to be some serious optimization to make this naive scheme practical.
mhagger| If the origin properties are accumulated over time, then reverse merges can be easily accommodated by subtracting out the origins of the reverse-merged revisions. If not, then there would have to be a way to specify a "negative" merge-origin.
dlr| CREATE TABLE mergeinfo (revision integer not null, mergedfrom text not null, mergedto text not null, mergedrevstart integer not null, mergedrevend integer not null);
dlr| CREATE TABLE mergeinfo_changed (revision integer not null, path text not null);
dlr| we're tracking some of this information already -- I'm trying to get a grip on the delta between what you're describing and what we're doing
*| dlr notes that this simplistic schema is currently in place to simplify debugging
mhagger| mergedfrom and mergedto are repo-relative paths?
dlr| yes
dlr| INSERT INTO "mergeinfo" VALUES(2, '/A', '/copy-of-A', 1, 1);
dlr| INSERT INTO "mergeinfo_changed" VALUES(2, '/copy-of-A');
dlr| that describes the merge of /A:1 into /copy-of-A in r2
mhagger| I guess instead of mergedrevstart and mergedrevend you would want something like origin_uid
dlr| (as caused by the copy I described initially far above)
mhagger| Inheritance along the repository path could work by set addition/subtraction working from branch to leaf
dlr| origin_uid then requires a UUID for a 'merge', rather than for nodes in our FS?
mhagger| I don't think you can use node ids because you can't convert them back to paths, right?
mhagger| UUID could simply be path:revision of the original source of the merge
dlr| okay
mhagger| Strictly speaking, you wouldn't have to record the source of the merge--just the original sources
dlr| you're suggesting that we store the "original" change
mhagger| though probably it would be nice to record both
mhagger| That is my hand-waving proposal, yes.
*| dlr is still trying to figure out how we determine that
mhagger| Because it is recorded as the original source of the revisions you are merging from
dlr| Does this require that 'merge' be a first-class operation, so you can distinguish between a change and a merge?
dlr| change == via Emacs
mhagger| Well, you have to be able to distinguish between a change and a merge. But that could simply be recorded in revprops like today.
mhagger| Oops wait a minute
mhagger| If they are revprops, then "svn merge" has no way to adjust them.
mhagger| You'd want an operation like "svn propset --revprop" that operates on a WC instead of a URL+revision
mhagger| It would record the revprop and include it with the next commit
mhagger| (this would be useful for other things, too)
mhagger| Or you would have to store accumulated origin information in versioned properties like svnmerge.py.
mhagger| Of course the back end could deduce the delta in the accumulated origin information and store only the deltas in the SQL table
dlr| today we record in URL+revision
dlr| though I notice one of the slides I added late to my presentation says "revision property", which is confusing
mhagger| I would like to suggest that if you think this might conceivably be an interesting alternative, that you discuss it quickly with DannyB and anybody else who is interested in merge tracking. If you don't find obvious holes in the plan, let me know by email and I'll try to write something up a little more formally.
mhagger| Is that a reasonable plan?
dlr| mhagger: I have to confess that I still don't understand the delta between what you're suggesting and what we've been up to.
madan| mhagger, could you write it down anyways...
madan| in a mail...
dlr| It sounds so similiar that I'm missing the subtleties
mhagger| madan: OK, I'll try to get to it by the weekend or so.
dlr| thanks mhagger
mhagger| Main differences: explicitly keep track of the "original revisions" that went into each commit (not just an aggregate for HEAD). Treat original revisions as equivalent even if they come by way of different branches.

text/plain attachment: patch

application/pgp-signature attachment: stored

Received on Wed Oct 25 01:39:17 2006

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]