[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: "svn diff" and "svn merge" sittin' in a tree

From: Philip Martin <philip_at_codematters.co.uk>
Date: 2002-02-10 04:07:51 CET

Greg Hudson <ghudson@MIT.EDU> writes:

> I hate to pull the emergency break this late in the design, but after
> the recent discussion on using diff3 in "svn update", I think we're
> wasting our time designing an "svn merge" which simply takes the diffs
> from "svn diff" and applies them to the working copy. It will work, but
> it will never be as satisfying as CVS is.
>
> What we really want to do is take the same plaintexts as "svn diff" uses
> to make its diffs, and do a three-way-merge with them and the working
> copy.

Absolutely, a three-way diff is required. That in itself is not a
problem, changing the "diff" callback to take three files instead of
two is quite simple.

A few thoughts about merging, I haven't tried to articulate these in
the context of Subversion before, so bear with me. If nothing else,
it will ensure I'm talking about the same things that you are :)

It really only makes sense to merge things that have a common ancestor
(thus the need for three files as you pointed out), so consider two
files foo.c and bar.c with a common ancestor
 

   foo.c:N-1 --- foo.c:N --- foo.c:N+1 ..... foo.c:X
                  \
                   bar.c:M --- bar.c:M+1 ..... bar.c:Y

Now foo.c:N and bar.c:M are identical, and either can be considered
the common ancestor. Next consider a merge, lets assume that the
working copy contains foo.c:X and we want to merge bar.c:Y into the
working copy. Thus we need the diff between bar.c:M and bar.c:Y which
we apply to foo.c:X. This is just like applying the diff

      svn diff -rM:Y bar.c

to foo.c. The diff editor will generate the two fulltexts for bar.c
and the full text for foo.c exists in the working copy. This looks OK,
we can see that it should be easy to extend the diff editor to do
this. In the above command, bar.c could be a working copy path or an
URL.

However, in my experience, merging individual files is not the common
operation, most of the merges I do involve directory hierarchies. This
still looks OK, the diff editor already handles directory hierarchies.

Now I understand that Subversion intends to store merge information in
the metadata. This is good because merging often happens multiple
times. So after the above merge, development continues and later I
want to merge again

   foo.c:N-1 --- foo.c:N ........ foo.c:X ...... foo.c:X+A
                  \ ^
                   \ | merged
                    \ |
                     bar.c:M ..... bar.c:Y ..... bar.c:Y+B

When we come to merge foo.c:X+A and bar.c:Y+B we don't really want to
use the common ancestor N/M, rather we want to use X/Y the most recent
merge. Again this is just like applying the diff

      svn diff -rY:Y+B bar.c

to foo.c. Still everything looks OK, the diff editor can do this sort
of thing. At some point the metadata has to be interrogated to obtain
the common ancestor, but the basic operation is the same.

However, a problem occurs when we consider merging a complete
hierarchy. In this case the most recently merged revision may not be
the same for all the elements in the hierarchy. Individual files, or
subtrees, of the hierarchy may have been merged at different
revisions. It is not so clear (to me anyway) that the diff editor can
easily be modified to handle this.

Using previous merge information is important if we are to have an
effective merging algorithm. If the previous merge is ignored, the M-Y
changes will be applied a second time. With luck this will just result
in an "already applied" message, the bigger problem is if the X/Y
merge resulted in conflicts that needed to be resloved. If the X+A/Y+B
merge can use X/Y as a common ancestor these conflicts will not occur
again. However if the previous merge is ignored, and N/M is used as
the common ancestor, the conflict will reappear.

Let me explain how ClearCase handles this problem. ClearCase also
stores merge information in metadata and uses it to improve multiple
merges. The basic ClearCase merge command merges a file or directory,
but it is not recursive. It only affects a singe file, or a single
directory without its files. The merge command is coupled with the
findmerge command which *is* recursive. Running a findmerge will
search the heirarchy[1] and identify the individual elements that need
to be merged. The user will generally invoke a findmerge command to
merge two branches, automatically running a number of merge commands
on individual elements.

Subversion may well need to adopt a similar approach. It may require
some sort of merge editor that identifes the common ancestor for each
element. This in turn might invoke a modified diff editor to carry out
each individual merge, or it may just carry out the merge itself.

The other solution is to do what is normal practice in CVS (at least I
think it is normal practice, I'm not a big CVS user.) That is to
manually tag (copy in Subversion terms) before (and after?) each
merge. Then you can refer to the tags in subsequent merges, thus
providing the common ancestor manually. I would *really* like
Subversion to do better than this.

A point of reference: the ClearCase merge metadata (sometimes referred
to as "merge arrows") and the actual merges are distinct. The default
behaviour of the findmerge/merge commands is to record the merges in
the metadata, but ClearCase does allow merges without recording
metadata, and it also allows the generation of the metadata without
carrying out a merge. I have never used either of these features.

Finally, the ClearCase findmerge command allows a merge conflict
resolution application to be started automatically to resolve each
merge conflict as the merge progresses. The GUI version of this
application is rather neat (even to this command line fan) and using
this feature means that when the findmerge completes all conflicts
have been resolved. Much as I liked this, I would rate it lower than
the proper recording and use of merge metadata.

[1] Running findmerge on a large hierarchy can be expensive - minutes
even hours rather than seconds. The findmerge command is so
powerful/complex that it provides a "dummy run" option that just
reports what would be merged, rather than actually carrying out the
merge.

-- 
Philip
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:06 2006

This is an archived mail posted to the Subversion Dev mailing list.