[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: "svn diff" and "svn merge" sittin' in a tree

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-02-15 06:08:09 CET

Philip Martin <philip@codematters.co.uk> writes:
> Greg Hudson <ghudson@MIT.EDU> writes:
> > I hate to pull the emergency break this late in the design, but after
> > the recent discussion on using diff3 in "svn update", I think we're
> > wasting our time designing an "svn merge" which simply takes the diffs
> > from "svn diff" and applies them to the working copy. It will work, but
> > it will never be as satisfying as CVS is.
> >
> > What we really want to do is take the same plaintexts as "svn diff" uses
> > to make its diffs, and do a three-way-merge with them and the working
> > copy.
>
> Absolutely, a three-way diff is required. That in itself is not a
> problem, changing the "diff" callback to take three files instead of
> two is quite simple.

Okay, I confess to being completely confused here.

Every so often, someone says "we can't just apply diffs to the working
copy to do merge, we have to do a 3-way diff."

I don't understand how the proposed 3-way diff helps us, or, if it
does help us, how exactly it differs from what we've been discussing
for "svn merge" all along.

Above, Greg implies that what CVS does is good (presumably 3-way) and
what we're going to do is bad (presumably not 3-way), while others
have implied that "cvs merge" is not a good model because it is not
3-way :-). So below, without using the word "3-way", I'm going to
describe how CVS's merge works, and how I've always assumed
Subversion's merge is going to work, and show how they're the same
method, and how I think they relate to what the `diff3' program does
(and thus relate to 3-way merging?). If I miss something big, someone
please make a noise.

CVS's merge takes the differences between two revisions of a file F,
and applies them to some third revision of F. In normal usage, the
first two revisions are on one branch, and the third revision is in
your working copy and is based on a different branch (note that the
mainline is considered just another branch for these purposes). [I
suppose theoretically you could find some way to merge changes from
unrelated things to an unrelated thing, but the basic process
described below would still be the same.]

Here's CVS merge:

   cvs update -j TAG1 -j TAG2 foo.c
   vi foo.c ### inspect it for conflicts, resolve any
   cvs commit -m "Merged TAG1-->TAG2 changes into foo.c."

which is essentially equivalent to:

   cvs up -p -r TAG1 foo.c > TAG1-foo.c
   cvs up -p -r TAG2 foo.c > TAG2-foo.c
   diff TAG1-foo.c > patchfile
   patch foo.c < patchfile
   vi foo.c ### inspect it for conflicts, resolve any
   cvs commit -m "Merged TAG1-->TAG2 changes into foo.c."

... ignoring options to diff and patch, and ignoring the fact that
`patch' doesn't actually put conflict markers into foo.c, so you'd
have to examine a .rej file and all that, but you get the idea. Heck,
you could also do

   cvs diff -r TAG1 -r TAG2 foo.c | patch foo.c

It's all the same. Of course, real CVS *does* put conflict markers
into the file; I believe it uses diff3 internally, to do that. But
it's not doing anything fundamentally beyond the power of regular diff
and patch, except for the nice markup (which not everyone agrees is
nice, to my great surprise when we took an informal poll on this
list).

Okay, so what about "svn diff"?

Same thing, basically. I'm improvising the command syntax here, so
please bear with me. The fully general command is:

   svn merge -R<rev1>:path1 -R<rev2>:path2 foo.c
   ### inspect for conflicts, resolve any
   svn commit -m "merged changes from blah to blah, blah blah."

But of course, it's more common to do something like this:

   svn merge -r<rev1>:<rev2> PATH-OR-URL <args optional, implied `.'>
   ### inspect for conflicts, resolve any
   svn commit -m "merged changes from blah to blah, blah blah."

This would do exactly the same thing as CVS: take the differences
between the specified paths and merge them into the right files in
your working copy. (Not going to talk about taking previous merge
history into account yet; that's an interesting but separate problem.)

I won't bother to give the series of equivalent "svn diff" and patch
commands, they're pretty obvious.

Thus, both CVS merge and SVN merge involve three files: two "sources"
and one "target". You could call the first source a "common
ancestor", but that's stretching a bit. In a merge, we don't usually
want *all* the changes between some genuine common ancestor and its
descendant (cousin to our target) applied to our target. We want a
certain subset of those changes, a range, and this is the ability CVS
provides (and what I always presumed SVN would provide, but with
better memory, to help the user get the right range[s]). Note that
the left side of a given range is just as much a cousin of the target
as the right side is, but for some reason in merge terminology it
seems to be called the "common ancestor" of both the right side and of
the target.

Okay, on to `diff3'. This is from the `diff3' documentation:

   `diff3' can incorporate changes from two modified versions into a
   common preceding version. This lets you merge the sets of changes
   represented by the two newer files. Specify the common ancestor
   version as the second argument and the two newer versions as the
   first and third arguments, like this:

      diff3 MINE OLDER YOURS

   You can remember the order of the arguments by noting that they are
   in alphabetical order.

   You can think of this as subtracting OLDER from YOURS and adding
   the result to MINE, or as merging into MINE the changes that would
   turn OLDER into YOURS. This merging is well-defined as long as
   MINE and OLDER match in the neighborhood of each such change. This
   fails to be true when all three input files differ or when only
   OLDER differs; we call this a "conflict". When all three input
   files differ, we call the conflict an "overlap".

Is this what is meant by "3-way merging"? I've always assumed it is.

So the $64,000 question is, does `diff3' pay attention to the
differences between MINE and OLDER? In other words, does it *really*
treat OLDER like a common ancestor? Since OLDER is not actually a
common ancestor, in most real-life use cases of merge, it would be
problematic if `diff3' cared about differences between OLDER and MINE.
Suppose the same particular change appears in both OLDER and YOURS,
but MINE does not have this change, for example.

Let's get concrete. Here are older.txt and yours.txt:

---------------------------------------------------------------------
         older.txt yours.txt
_____________________________________________________________________

HAN: | HAN:
 Hokey religions and ancient | Hokey religions and ancient
 weapons are no match for a | weapons are no match for a
 good blaster at your side, | good blaster at your side,
 kid. | kid.
                               |
LUKE: | LUKE (upset):
 You don't believe in the | You don't believe in the
 Force, do you? | Force, do you?
                               |
HAN: | HAN:
 [langorously] | [langorously]
 Kid, I've flown from one side | Kid, I've flown from one side
 of this galaxy to the other. | of this galaxy to the other.
 I've seen a lot of strange | I've seen a lot of strange
 stuff, but I've never seen | stuff, but I've never seen
 anything to make me believe | anything to make me believe
 there's one all-powerful | there's one all-powerful
 force controlling everything. | force controlling everything.
 There's no mystical energy | There's no mystical energy
 field that controls my | field that controls my...
 destiny. | HEY, THAT'S NO MOON!

The only changes from OLDER to YOURS are

   - Luke is upset
   - hey, that's no moon! (that is, the last two lines differ)

Assume that there is never whitespace before a newline, of course;
the tabular arrangement above may not make that clear. Note also that
Han speaks "langorously" in both files.

Now here is mine.txt, the merge target:

                             mine.txt
               -------------------------------------------
                      HAN (cynically):
                       Hokey religions and ancient
                       weapons are no match for a
                       good blaster at your side,
                       kid.
                      
                      LUKE (suspiciously):
                       You don't believe in the
                       Force, do you?
                      
                      HAN:
                       Kid, I've flown from one side
                       of this galaxy to the other.
                       I've seen a lot of strange
                       stuff, but I've never seen
                       anything to make me believe
                       there's one all-powerful
                       force controlling everything.
                       There's no mystical energy
                       field that controls my...
                       HEY, THAT'S NO MOON!

In mine.txt, Luke is "suspicious" rather than "upset" (a conflicting
change), Han speaks "cynically" the first time (non-conflicting), and
Han does not speak "langorously" the second time (the $64,000
difference -- a line that is missing only in mine.txt).

And how does diff3 behave?

   $ diff3 -m -E mine.txt older.txt yours.txt
   HAN (cynically):
    Hokey religions and ancient
    weapons are no match for a
    good blaster at your side,
    kid.
   
   <<<<<<< mine.txt
   LUKE (suspiciously):
   =======
   LUKE (upset):
>>>>>>> yours.txt
    You don't believe in the
    Force, do you?
   
   HAN:
    Kid, I've flown from one side
    of this galaxy to the other.
    I've seen a lot of strange
    stuff, but I've never seen
    anything to make me believe
    there's one all-powerful
    force controlling everything.
    There's no mystical energy
    field that controls my...
    HEY, THAT'S NO MOON!

Looks suspiciously like CVS merge, doesn't it? :-)

Note that Han is *still* not langorous. In other words, diff3 behaved
basically the same as diff+patch would, except it nicely handled the
conflict markers for us (nicely in my opinion, anyway; like I
mentioned, apparently not everyone likes conflict markers :-) ).

Note that if you move the "[langorously]" line from older.txt to
mine.txt and rerun diff3, that line will also be present in the
output. So not only does diff3 not truly treat OLDER as a common
ancestor (in the most literal sense), but it Does The Right Thing with
respect to already-applied changes.

Is this just an artifact of the options I chose to diff3? Am I
missing something fundamental here?

I'm all for using diff3 if it gets us something we want, such as
optional conflict markers. But aside from that, I don't see any
fundamental difference between using diff3 and diff+patch. Whatever
"3-way merge" means, either both methods are doing it, or neither is.
Please, someone tell me what I'm missing? I feel like it's something
big, since so many intelligent people are gung-ho that we must do
3-way merge, but for the life of me I can't see what the difference
is. (Perhaps an explanation with very concrete examples would help me
understand?)

Philip's other comments about merging (in which he describes how
ClearCase handles repeated and conflicting merges) are not addressed
here. Those are interesting problems that we need to solve, but I
don't see (?) that they're related to the 3-way question, or
non-question, as the case may be.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:08 2006

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.