Re: Branch families and branchify

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Fri, 10 Apr 2015 12:44:27 +0200

On Thu, Apr 9, 2015 at 2:56 PM, Julian Foad <julianfoad_at_gmail.com> wrote:

> Hi Stefan. You mentioned you have an idea about making 'branchify'
> less 'destructive' -- that is, making it possible to start developing
> in a subtree and then later treat that subtree as a sub-branch without
> breaking its history.
>
> Currently the 'svnmover branchify' command creates a completely new
> branch (a member of a different family that is nested inside the
> original branch), copies the *content* of the designated subtree into
> the new branch, and deletes the original subtree, and renames the new
> subbranch-root element back to the name of the original subtree. That
> is destructive, in the sense that although the history of the old
> subtree is still present in the repository there is no modelling
> mechanism to link the old history seamlessly to the new history.
>

That is roughly what I meant by "destructive". However,
I came from a user's POV where branchify destroys the
ability to merge changes from / to a parent branch where
the respective sub-tree is not a sub-branch.

IMO, sub-branches would most likely be used in larger,
complex projects and will be introduced on demand.
Hence, the branchify operation even exists. But those
large projects also have long-living maintenance branches
and branchify breaks merging for them. That would make
sub-branches impractical (useful if you had them but
almost impossible to introduce).

I have gradually been re-evaluating my ideas about branch families.
> Recently I changed the implementation to have only one family at each
> level of nesting: the repo root and '/branches' dir are still in
> family 0 as before, but all normal first-level branches are in family
> 1 even if they belong to different projects. And all second-level
> subbranches in level 2, and so on. Compared with creating a new branch
> family for each independent set of branches (each project, for
> example), this has the advantage that the branches can later be
> combined, while preserving their history, into a single bigger branch.
> The svnmover test 'restructure repo: projects/ttb to ttb/projects'
> demonstrates this (currently XFail due to no suitable UI being
> available; I'll fix that soonish).
>

Interesting observation. It implies that branch families are
technical constructs (ensure correct nesting) rather than
semantic ones (only merge between related trees).

> I still have a lingering compulsion to ensure that a subbranch is in a
> different family from its outer branch, and that the nesting order of
> branch families is fixed for all time. It seemed like if the subbranch
> elements are in the same family, then there would be nothing to
> distinguish it as a subbranch. And I wanted to prevent inversion --
> where subbranch B is initially inside branch A but then after a series
> of changes A might end up inside B.
>

My priority 1 requirements would be that when we need to
degrade functionality (e.g. node tracking does not work
across an inversion point), we

* still allow the user to make these changes in a convenient way
  (about as easy as they are today),
* provide degraded function that is at least as convenient as today
  (e.g. don't introduce artificial conflicts), and
* limit degradation to (crossing of) a specific point in the project history
  (i.e. don't make things worse and worse as the project ages)

Anything short of that will not be accepted by users.

However, I am not sure there is any reason for that. What is necessary
> is that the subbranch must be distinguished as a subbranch -- but
> perhaps it is OK if the subbranch is actually a branch of the same
> family. And perhaps it is OK if the subbranch is a branch of any
> family, even one that is or was also nested the other way around.
>

Basically, that reduces sub-branches to scopes (some relpath)
within the one and only project-level branch family. They would
still be useful in preventing / detecting misaligned merge attempts.
It could also still be used to nicely handle duplication, splits and
joins of sub-trees - which is what sub-branching is all about.

However, it seems that in this case, there is no point for the
root branches to span less than the whole repository. Branchify
would simply become an annotation expressing "users are
intended to branch at this level".

> If it's true that we can allow a subbranch to be of any family, then
> the algorithm for 'branchifying' would go like this:
>
> * start with existing branch B of family F, with its root element e0
> (currently at path 'trunk', let's say)
> * we're going to turn the subtree at element eX (at path
> 'trunk/sub', say) into a subbranch S
>
> 1. create a new branch S in the same family F, but with its root
> element designated as eX, its content being branched from the subtree
> found at eX in the old branch; this new branch is not yet anchored at
> a path
> 2. delete element eX from the outer branch P
> 3. create a new element eY in P, of kind 'subbranch-root', pointing
> to S, and instantiate it at the same path where eX was
>

I suppose by "delete" you mean removing entries from
the elements table - not actually deleting the sub-tree.
My gut feeling is that we need to keep the node history,
as defined today by the FS backend, intact. That way
the branchify operation enriches the data model instead
to losing information.

The result is that at path 'trunk/sub' there is now a 'place-holder'
> element eY of type 'subbranch-root: branch S', and that subbranch is a
> *branch* of the content that was there before. Thus its history flows
> seamlessly across the that moment in time.
>
> And then there is no more need for separate branch families.
>
> I wonder if these thoughts are in line with yours?
>

I've been coming from a different angle trying to define the
semantics of a merge that crosses a branchify operation.
For example, lets have branches B1 and B2 branched from
B1, both with a sub-path ./sub/foo . Now, modify ./sub/foo in
B2, branchify ./sub in B1 and create a branch B1/sub2 from it.

How are B1/sub/foo, B2/sub/foo and B2/sub2/foo related now?
Because B2 has not seen/received the sub-branch, yet, the
modification in B2/foo/bar should be treated as if it had happened
before B1/sub got branchified.

A catch-up merge from B1 to B2 should simply branchify B2/sub,
keeping the modified B2/sub/foo and then duplicate it into
B2/sub2/foo. If B1/*/foo got modified, there may or may not
be text conflicts.

A catch-up merge from B2 to B1 would virtually apply the
modification to B1/sub/foo before B1/sub got branchified and
then the change gets transformed following the branchify and
branch operations. The result in B1 looks just the same as
the result in B2 had when merging into the other direction.

I guess the key here is that our merge implementation today
is an optimization rather than "the only correct way" to do it.
Merge should replicate (all selected) changes to create the
same aggregated effect in the target as they had in source.

Today, we use a simple 3-way diff to determine the aggregate
but that may no longer be correct when branchification
declares (part of) the tree to be unrelated.

-- Stefan^2.
Received on 2015-04-10 12:46:25 CEST

This message: [ Message body ]
Next message: Julian Foad: "Re: Branch families and branchify"
Previous message: Julian Foad: "Re: Remaining suggested API changes for 1.9"
In reply to: Julian Foad: "Branch families and branchify"
Next in thread: Julian Foad: "Re: Branch families and branchify"
Reply: Julian Foad: "Re: Branch families and branchify"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]