Making merge histories reproducable for v1

From: Bill Tutt <rassilon_at_lyra.org>
Date: 2002-08-13 04:48:31 CEST

Ok folks. I was going to just reply to the current thread when I
realized that I should just start a new thread and explain the idea I
have for v1 in this space.

Goal: To easily store merge history in v1.0, and ensure that
dump/copy/reload can reproduce this post-v1.0 so that whatever schema we
end up having then can do whatever it wants with the data.

High level overview:

The problem essentially breaks down into two interesting sub-problems:

1) Recording merge history related to file/directory modifications.
2) Handling new files/directories in the source of the tree merge into
the destination.

Proposal for #1:

An ancestor set like representation for file merges is identical to what
Branko, and Will have been talking about. That is, append a line to the
"svn:merge" property every time a merge happens.

e.g.:

Proposal for #2:

While conferring with Branko about the necessary changes required to
make these kinds of merge histories reproducible, we've come to the
conclusion that whatever way you slice it, we need a slight change in
Subversion's data model.

I'll present a couple of complete alternatives for handling this problem
so the folks who might end up doing the work for 1.0 can pick between
various implementation ideas, or use the below to come up with their own
idea.

However, first I'll present the semantics I'd like to see happen for
this particular subcase of a merge. Here after known as the merge-copy.

Definitions:
Merge:
v. merged, merg-ing, merg-es
v. tr.
1. To cause to be absorbed, especially in gradual stages.
2. To combine or unite: merging two sets of data.

v. intr.
1. To blend together, especially in gradual stages.
2. To become combined or united. See Synonyms at mix.

Source: www.dictionary.com: The American Heritage Dictionary of the
English Language, Fourth Edition.

Ideal Proposed Semantics:

As you can guess from the above definition, merges are about swallowing
up changes from source branches and uniting them with the target branch.

New files/directories fall into three distinct subclasses:
1) Complex copy destinations, (this sadly includes moved
files/directories)
You could call these locations branch points if you like the branch
metaphor.
2) Files.
3) Directories.
4) A simply renamed file.

Here's an example simple file rename: "svn mv ./fileA ./fileB"

The distinguishing characteristic about a simple file rename is that the
destination of the move operation is the same directory.

i.e.:
void IsSimpleRename(NodeRevision nr)
{
  NodeRevision CopyDest = lookupCopyDestination(nr.CopyID);
  If (nr.NodeID == CopyDest.NodeID
      && strcmp(basename(currentPathToNR),
basename(commitedPathToCopyDest))
         == 0)
  { return true; }
  else { return false; }
}

Sadly, we can't easily distinguish between copies and non-simple moves
because we've already lost that data.

This is probably the most annoying part of Subversion's current data
model. :(

Now that we have our 4 classes of merge-able entities, let me explain
how I'd ideally like to handle them.

1) Complex Copy Destinations
NewNodeRevisionPK = OldNodeID.NewCopyID.CurrentTxnID

The reasoning for this is simple. We're merging in a branch so therefore
the merged entity should also be a branch.

2) Files
3) Directories
4) Simply Renamed File
All 3 of these cases thankfully have identical semantics:

NewNodeRevisionPK = OldNodeID.ParentDestCopyID.CurrentTxnID

This is a little trickier to understand, so I'll run through a couple of
examples.

e.g.:
svn add trunk/file1; svn commit
svn cp trunk branch; svn commit
svn add branch/file2; svn commit
svn merge branch trunk; svn commit

The 'ParentDestCopyID' in the above rule means that 'trunk/file2' after
the merge will have trunk's CopyID.

e.g.: A more complicated example
svn add trunk/file1; svn commit
svn cp trunk branch; svn commit
svn add branch/dir2; svn commit
svn add branch/dir2/file2; svn commit
svn add branch/dir2/file3; svn commit
svn edit branch/file1; svn commit
svn merge branch trunk; svn commit

The merge operation will commit one merge-copy operation, and one normal
merge modification change operation.

trunk/file1 will have a "svn:merge" property with an appropriate value
for branch/file1.

trunk/dir2 will have OldDir2NodeID.TrunkCopyID.CurrentTxnID as its
primary key.

The reason we're doing things this way, is because we're bringing items
into our existing branch. (aka merge)

See, Copy is synonymous with Branch, how about that.... ;)

Preserving Reproducibility:
"Keeping Merge Information through a Dump/Load Cycle"

We need to record these merge-copies as being merge-copies someplace.
Otherwise, we won't be able to tell later how these newly inserted rows
got there.

So, let's add the data to the table where all data like this belongs.

i.e.:

   CHANGE ::= ("change" PATH ID CHANGE-KIND TEXT-MOD PROP-MOD) ;
   CHANGE-KIND ::= "add" | "delete" | "replace" | "modify" |
"merge-copy" ;
   TEXT-MOD ::= atom ;
   PROP-MOD ::= atom ;

When CHANGE-KIND is "merge-copy" PATH, and ID are the destination PATH,
and the new NodeRevisionPK value.

If they want to know where the data was merged from they can look at the
listed ID's predecessor data, and lookup the path via the Changes table.
:)

If the merge-copy was a branch point, then the new CopyID gets inserted
into the Transaction table's list of copies.

What about 1.0, I'm not doing all of this before 1.0?:
"Make less work for Mike, Brane, etc.."

Skipping handling simple file renames is an easy cop out of course, but
let's take a look at Will's idea.

Will's Idea Restated:
Instead of using the ParentDestCopyID, introduce a new row in the Copy
table, but call it a merge instead of a copy. This is equivalent to my
change to the Changes table.

However, he didn't specify any semantics after that. In order for my
ideal semantics to hold, the "choose_copy_id" logic would have to be
subtly changed to ignore merge point CopyIDs. Otherwise, non-merge
related files underneath this new directory would inherit this new
CopyID which doesn't make any sense. Merging doesn't create more
branches, merges should slide into the branch they're uniting with.

Now even if we want to leave the choose_copyid logic as is, it's true
that with the new distinguishing characteristic of merge vs. copy in the
Copy table that we do have enough information to go back and fixup the
data post-1.0.

Anyone with free time want to pick up this work?

I hope this is easier to follow then my previous multi-page
explanations.

Crossing his fingers,
Bill

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Aug 13 04:49:01 2002

This message: [ Message body ]
Next message: Ben Collins: "Re: [lord@regexps.com: business models and revision control]"
Previous message: Tom Lord: "Re: [lord@regexps.com: business models and revision control]"
In reply to: Branko ÄŒibej: "Re: Saving merge history"
Next in thread: Karl Fogel: "Re: Saving merge history (was: Re: merge should copy-with-history)"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]