Merge Architecture

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Tue, 8 Jan 2013 11:12:21 -0500

Dear merge enthusiats,

I have been formulating a plan for how to re-architect Subversion's merge
implementation -- meaning generally all the merging done by "svn merge" and
also the merging done by "svn update". I have also been thinking about how
to bring rename tracking into merging, and this will require clarity in the
architecture, and this is what brings me to talk first about
re-architecting the existing functionality. Only by drastically
overhaluing the architecture can we hope to move ahead with new
developments at a reasonable pace. This is not a complete plan but sets
out some of the main points.

MERGE AND UPDATE

Merge looks roughly like this:

* calculate what diffs to perform (using merge algorithms)

    * do diff on server
        --> merge this diff into the WC working/actual state
        (3-way or 4-way merge; do or don't record mergeinfo;
         record conflicts; ...)

* run conflict resolver

Update and Switch look roughly like this:

* calculate what diff to perform (using "update reporter")

    * do diff on server
        --> apply this diff into the WC base state
        --> merge this diff into the WC working/actual state
        (3-way merge; don't record mergeinfo;
         record conflicts; ...)

* run conflict resolver

Clearly there is a chunk of merge and conflict handling functionality
common to "merge" and "update". We need to re-architect the merge code and
the update code to share that functionality. (I hope I don't need to argue
in detail *why*. It's for all the classical reasons -- behavioural
consistency, untangling the spaghetti, testability, you name it.)

Oh, and let's not forget 'patch'. Patch looks roughly like this:

    * read diff information from the patch file
        --> merge this diff into the WC working/actual state
        (patch-merge, like a 3-way merge; don't record mergeinfo;
         record conflicts; ...)

* run conflict resolver

So let's look at how to factor out the commonality while preserving the
"special" qualities of each kind of merging.

INTERFACES

Key interfaces that we need include:

  * tree state
    -- the state of a tree of svn nodes;
    -- with their props;
    -- with non-property metadata such as rev nums, mergeinfo;
    -- with a standard set of read ops
    -- with a standard set of write ops, either grouped as an 'editor'
(Ev1/Ev2) or not;
    -- the key point is that this same interface will operate on a WC-base
subtree, a WC-working-actual subtree, a committed repository subtree
(read-only), and even a repository commit txn tree (read-write) or a
temporary virtual tree used for internal purposes;
    -- I've been trying out this kind of API on 'tree-read-api' branch;

  * difference between two tree states
    -- svn_delta_editor and svn_editor_t are two such interfaces;
    -- svn_editor_t in particular is geared to diff between two *fully
known* states,
       fine for 3-way merge;
    -- svn_wc_diff_callbacks_t is geared to diff between two *partially
known* states,
        which is needed for 'patch' and 4-way (cherry-pick) merge;
    -- svn_wc_diff_callbacks_t design is really bad in some ways and must
be cleaned
        or replaced;

  * mergeinfo state
    -- representation of the mergeinfo applicable to a tree;
    -- we already have this as svn_mergeinfo_catalog_t, but abstraction
could be improved;

We can start by abstracting blocks of functionality out of the existing
code, into modules using interfaces such as these. I think in this way we
may be able to move ahead somewhat incrementally, keeping things working as
we go, which I agree is a valuable way to proceed.

MERGEINFO

Internally, we represent mergeinfo using the 'svn_merge_range_t' data
structure which directly reflects the "non-inheritable" flag used in the
svn:mergeinfo representation. The definitions of functions for combining
mergeinfo (diff, intersect, etc.) are complicated by the presence of this
flag; some of their specs are still impenetrable to me. There is a simpler
way to represent, or at least to operate on mergeinfo at this level:
consider a mergeinfo rangelist as being two simple range-lists (that do not
contain inheritability flags), one that applies to "this node" and one that
is inheritable below this node. Then perform any required operation (diff,
intersect, etc.) on both of those lists separately.

In libsvn_client/merge.c, we manipulate mergeinfo on a node-by-node basis,
fetching "the mergeinfo" for a given node as needed, and treating the "node
has no mergeinfo property" as a special case. Instead, we should abstract
out the whole tree of mergeinfo, where the abstraction has no "elision" of
identical mergeinfo, and manipulate that, and then re-write it into a
concrete representation (with elision) at the end. (One detail is that we
may wish to tweak the elision stage, such as preferring not to elide where
that would create a change to a sub-tree that is otherwise unaffected by
the merge.)

In merge.c, we have many functions that can write the resulting mergeinfo
either directly to the WC or into a result parameter. This duplication
must go, and the correct way is closer to the latter. All mergeinfo reads
and writes should be performed on a mergeinfo tree state representation for
the target. If we have reason to update the WC incrementally, it should be
updated from the state representation and not directly. (We may wish to
consider updating the WC incrementally in order to leave the mergeinfo
consistent with what has actually been merged if the merge process should
terminate abnormally.)

Foreign-repository mergeinfo is currently stripped out by the function
prepare_merge_props_changed(). To support foreign-repository merges at
some point, one of the many things we will require is a way to represent
and manipulate foreign-repository mergeinfo. We would need to: invent a
representation for the repository id in the mergeinfo; choose a
backward-compatibility behaviour; convert incoming mergeinfo to the new
format and record new merginfo in the new format; and update the mergeinfo
algebra and conversion functions. (All of these seem straightforward.)

ENOUGH FOR NOW

That's all pretty waffly and hand-wavy but serves as an introduction to
what's in my head at the moment and what I want to get on with.

- Julian

-- 
Transform your software development department. Register for a free SVN
HealthCheck:
<http://go.wandisco.com/HealthCheck-Sig.html>

Received on 2013-01-08 17:13:16 CET

This message: [ Message body ]
Next message: Julian Foad: "Re: svn commit: r1429832 - /subversion/trunk/subversion/tests/cmdline/update_tests.py"
Previous message: Philip Martin: "Re: svn commit: r1429832 - /subversion/trunk/subversion/tests/cmdline/update_tests.py"
Next in thread: Julian Foad: "Re: Merge Architecture"
Reply: Julian Foad: "Re: Merge Architecture"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]