Merge logging for subversion:
=================================

Purposes of merging:
1. "copy" features/fixes from one branch to another.
2. undo changes (reverse difference)
3. resurrection of deleted items (better to use copy?)

Purposes of merge tracking:
1. identifies which features/fix have been merged to another branch.
2. allows automated common ancestor determination (for merging)

Requirements of merge tracking:
1. Identify which revision a merge was from and to (bi-directionally).
2. Identify which range of revisions a merge is from.
3. Logs required per merged file and for the whole repository.
4. Must be able to generate merged change set.

Commits can be removed from a branch be applying a reverse difference using the merge command. Is this a hack? Should it be recorded as a merge? Is it a merge at all? Need a new "svn uncommit" command?

Reverse differences are possible because we have to commit revisions in the repository that can then be reverse differenced. It has been mentioned that a reverse merge should also be possible and that the merge and subsequent conflict resolutions edits should be separately identifiable items. This is partly possible, since the merge differences can be recalculated at any time (given an accurate log entry), but it may be impossible to fully separate resolution conflict edits and other manual changes without an intervening commit. This would be undesirable (and even impossible in subversion) since conflicts would have to be committed unresolved. Why is a reverse merge desirable? Can it even be considered possible?

Since merging is performed to a working copy, it is always possibly to merge more than once, from different revisions, into a working copy before a commit is performed. It is obviously also the case that one revision can be merged to more than one working copy and committed. Merge logs must therefore be able to store more that one merged-to and merge-from record.

Best practice requires the user to record merge information in the commit log. This requirement will in many cases be forgotten and is in any case difficult to analyze, either manually or automatically, and is therefore inherently unreliable. Need an automated merge log which contains the same information: merge source (example: "http://svn.example.com/repos/calc/trunk") and merged revisions (examples "343" or "343:344"). This information can be represented in a standardized form: "http://svn.example.com/repos/calc/trunk@343" and "http://svn.example.com/repos/calc/trunk@343:http://svn.example.com/repos/calc/trunk@344". There is no further useful information to be stored and all of this can be automatically generated from "svn merge"'s parameters.

One possible solution is to have a merged-to record and a merged-from record list (with zero or more entries - preferably with zero overhead if it contains zero entries) for each revision of each file and directory in the repository. Each record in the list should be matched at each end of each of the respective merge operations, containing either a record of the form: "http://svn.example.com/repos/calc/trunk@343" or: "http://svn.example.com/repos/calc/trunk@343:http://svn.example.com/repos/calc/trunk@344" at both ends.

An alternative solution is to store "merge objects" with a unique merge-id as completely separate entities in the repository which are easily referenced from the appropriate file revisions. Could this use properties? "svn:merge-from" and "svn:merge-to" with a merge-id list as a setting? The "merge objects" would then contain only a single "http://svn.example.com/repos/calc/trunk@343" or: "http://svn.example.com/repos/calc/trunk@343:http://svn.example.com/repos/calc/trunk@344" range reference. Actually this solution is similar to the ClearCase approach.

ClearCase represents merges (for the purpose of merge tracking) using a directional "hyperlink" object with a type "Merged", which identifies the from and to revisions of each file that is merged. This information is indicative only. There is no mechanism to stop users from altering the merged files (to fix conflicts or for unrelated changes), or reject or revert some of the merged data, before file check-in (commit) and therefore no way to separate merged data from user modified data.

It is either impossible or at least impractical to make merged information uniquely identifiable. And there is no perfect solution available from any VCS or SCM system. ClearCase for example tries only to identify a common ancestor, searching "paths" across merges and branches, to automatically find the latest ancestral point from which to start generating differences.

. Common ancestor.
=====================

The Following diagram show a problematic scenario for "svn merge" which works well in ClearCase:

See original details at: http://svn.collab.net/repos/svn/trunk/notes/merge-tracking.txt

                    1     
                    2     
                    3     
                  /   \   
                 /     \  
                /       \ 
            one           1   
            two           2.5 
            three         3   
             |     \      |
             |      \     |   
             |       \    |            
             |        \   |            
             |         \ one                ## This node is a human's
             |           two-point-five     ## merge of two sides.
             |           three        
             |            |
             |            |
             |            |
            one          one
            Two          two-point-five
            three        newline       
               \         three  
                \         |   
                 \        |
                  \       |
                   \      |
                    \     |
                     \    |
                      \   |
                       \  |
                         one                ## This node is a human's
                         Two-point-five     ## merge of the changes
                         newline            ## since the last merge.
                         three

Creating a merge log at the "one\ntwo-point-five\nthree" node, which references the "one\ntwo\nthree" node allows the second merge to identify the "one\ntwo\nthree" node as a common ancestor and thereby only apply the (-two\n+Two) difference, leading to the single expected conflict, rather than "svn merge"'s whole file conflict. Subversion can handle this today, using a range restrictions on the merge, however its up to the user to figure out the common ancestor. (add a new revision identifier? Example, -rCOMMON:HEAD). The common ancestor has to be identified per merged file.

The limitation here, is that if there where prior changes on the left branch before the first merge, that for some reason where not merged during the first merge (due to a revision range restriction), they will not be considered for the second merge either. Perhaps, though, this is desirable. It is of course possible to merge these earlier changes with a separate merge, which would then lead to the "1\n2\n3" node being used as the common ancestor (probably a bad idea, users choice though).

. Conclusion.
=================

The intention of this text is not to show how merging should be done, but only to show what information is required to improve merge capabilities. Hopeful it can be seen that the requirements are low and the benefits high. Merge Tracking doesn't make merging better, but it does provide information that can make merging better.

The information that can be stored in a merge log, must be automatically determined from the merge command parameters or else where from within subversion - you can't rely on users to do this. The amount of information available is small, but this is actually all that is required. The merge-from a -to ends of the merge are all that is needed, but it is probable that an indication of a range restriction in the merge is useful.

Merge logging can't lead to a perfect merge system that gets every merge correct. It is only required to aim for a realistic goal. A reliably identifiable common ancestor is the only requirement of the best in breed merge tools that currently exist. Merge logging itself doesn't need to be perfect either, but should capture the information that the user makes available. There is no mechanism available in subversion, and there need not be, to allow for perfect separation of merged data and user modifications in the same working copy before a commit.

Separation of merge for application of reversed differences and merge for copying updates between branches makes the idea simpler, but requires a new command (uncommit?) or new parameters to merge. Creation of a revision alias (COMMON?) might be the only other required change from the users point of view. This would allow the current merge tool to provide a better merge solution, with no change to its fundamentals. (Diff variance would be nice too, of course, but that also needs common ancestor functionality.)

Blame is a difficult issue, considering that one or more merges can be completed, and the user can edit the files manually at any time, before a commit occurs. I have no ideas to solve that.