Proposal for an intelligent merge operation in subversion Imagine the following situation: trunk -------->[1]------>[2]----------->[4]---------->[8]---+---->[9] create add | bugfix | bugfix ^ repo commit | commit | commit | merge | | | | | | | | | branch | merge | | branch v v | [3]------->[5]--+---->[6]-------->[7] cr 1 cr 2 commit commit - The process for a very simple project starts with creating the repository (release [1]). - Files are added (release [2]). - A branch is created to implement some change requests (release [3]). - A bugfix is implemented in the trunk (release [4]). - The first change request is implemented (release [5]). - The bugfix in the trunk is necessary for the cr branch too. [4] is somehow merged with [5] (release [6]). - The second change request is implemented (release [7]). - A second bugfix is added to the trunk (release [8]). - The work on the cr branch is finished and the results [7] are somehow merged with [8] (release [9]). The important transitions in the above picture are the merges which result in releases [6] and [9]. I simply wrote "somehow merged" and to tell the truth, until i draw and analized the above picture, i thought of a merge as something where i take 2 final versions, compare them and with some mistery get a correct merged result. The whole thing is much easier to understand, if i don't handle a release as a result, but as a history. Each release can be expressed as the history of all the differences that were made from the initial release till this release. Definitions: Let rn be a release [n]; in our sample we have r1 to r9. As another definition, let's assume r0 is the release before we created the repository. Let dm_n be the difference between two releases [m] and [n] which make one step in our repository. All valid differences in our model are d0_1, d1_2, d2_4, d4_8, d8_9, d2_3, d3_5, d5_6, d6_7. The differences are just what you see, when you call the svn log command. With this we can describe release [4] and [5], which are the candidates to merge into [6] as follows (assuming, that at this time [5] is the HEAD of .../branch and [4] is the HEAD of .../trunk): r4 = d0_1 + d1_2 + d2_4 r5 = d0_1 + d1_2 + d2_3 + d3_5 To merge added code of r4 into r5 to get r6 we write r6 = r5 + r4 without "multiple merging" of already contained differences. r6 = d0_1 + d1_2 + d2_3 + d3_5 + d0_1 + d1_2 + d2_4 !!! d0_1, d1_2 are already part of r5; we can "substract" them r6 = d0_1 + d1_2 + d2_3 + d3_5 + d2_4 \-----------------------/ \--/ r5 to be merged in svn we whould write: svn switch .../branch svn merge -r 2:4 .../trunk resolve any conflicts commit without additional changes And now to the second merge. We want to merge the things from release [7], that are not already part of release [8] to get release [9] (assuming, that [7] is the HEAD of .../branch and [8] is the HEAD of .../trunk): r7 = d0_1 + d1_2 + d2_3 + d3_5 + d2_4 + d6_7 \-----------------------/ \--/ r5 merged before \----------------------------------/ r6 r8 = d0_1 + d1_2 + d2_4 + d4_8 r9 = r8 + r7 without "multiple merging" of already contained differences. r9 = d0_1 + d1_2 + d2_4 + d4_8 + d0_1 + d1_2 + d2_3 + d3_5 + d2_4 + d6_7 !!! d0_1, d1_2, d2_4 are already part of r8; we can "substract" them r9 = d0_1 + d1_2 + d2_4 + d4_8 + d2_3 + d3_5 + d6_7 \-----------------------/ \----------------/ r8 to be merged to merge this differences we need at least two merge calls: first merge 2:5 and second merge 6:7 from branch to trunk. 2:3 and 3:5 are combined to 2:5, because they directly succeed. in svn we whould write: svn switch .../trunk svn merge -r 2:5 .../branch resolve any conflicts svn merge -r 6:7 .../branch resolve any conflicts commit without additional changes How to add intelligent merge support in subversion Subversion already offers the path of differences for a release line without merges. Calling svn log on .../branch [5] shows something like: 5 | ... 3 | ... 2 | ... 1 | ... which can easily be interpreted as d0_1 + d1_2 + d2_3 + d3_5 just follow the log from bottom to top. after the merge [6] = [5] + [4] svn log shows: 6 | ... 5 | ... 3 | ... 2 | ... 1 | ... if the additional information, that [6] is not only the successor of [5], but also includes the difference d2_4 whould be available, this information could be used for later merges to prevent multiple inclusion of differences. A property or a special syntax in the log could be used to store all additional added differences. During merging, the full history of the endpoints of the merge could be derived from the normal log plus the additional diffs. With a simple algorithm, the unique differences could be calculated, offered to the user and/or applied. As a policy, the user must not add changes other than those, that resolve the conflicts during merging. That means: commit immediately after merge conflicts are resolved. The merge command has to be enhanced, to calculate the history of diffs, subtract multiple diffs, and prepare the information of resuming diffs somewhere. The commit after merge (resolve) uses this informationen to add it in a property or log. All i wrote till here is just an idea, how to do it. It's not even proved, whether it works. At least use it to learn what a merge is, as i did. Horst Schuenemann horst.schuenemann@t-online.de