Re: isn't variance adjusted patching horribly dangerous?

From: Tom Lord <lord_at_emf.net>
Date: 2003-04-10 22:18:05 CEST

>>> Karl Fogel:

>>> There are a lot more important things to be worrying about
>>> than this right now, even just in the area of improved merge
>>> support.

>> Tom Lord:
>> Neat. Like what?

> Karl Fogel:
> Like exactly how to record prior merge history :-).

The proposal on the table that I've heard about used the mechanism of
a property named (I think) "svn:mr" ("most recent"), attached to
individual file and directory nodes, recording the high-water-marks of
merges to those nodes from related branches. And I believe it's well
known that that proposal has to be extended in order to handle cherry
picking. My svn notes don't record any discussion of transitivity of
merges: when you set svn:mr to reflect a merge, either (a), you'll
have to include the data about third branches from the svn:mr of the
merged-in node, or (b) when you look at a svn:mr value, you'll have to
transitively search the svn:mr properties of the nodes it refers to.

That does, indeed, sound like a useful mechanism: but not a sufficient
one, and one that will be difficult to design and implement compared
to the alternative I'll propose below.

The reasons it's difficult to fully design that mechanism have to do
with the potentially different granularity of changes being merged and
merges being applied. Suppose my trunk has:

proj/A/hello.c
proj/B/

and my branch renames A/hello.c to B/hello.c I merge the changes to
directory A from the branch, but not to directory B. How should that
be handled? At a later time, I merge the changes to directory B.
What is the final result? There are multiple logically consistent
answers, none of which seems to me to be ideal: (a) such partial
merges are forbidden; (b) hello.c disappears, then in the second merge
a version of hello.c is added to directory B: but what version of
hello.c (b1) the version from the branch; (b2) the resurrected
version from my tree; (b3) a merge of those? Supposing we want to
avoid the three subcases of (b), we might have (c) such partial merges
are sometimes permitted, but not if they would delete a file that
differs between the merged-into and merged-from branch (but then how
do I reasonablly and convenient work around that?). I'm not saying
that problems such as this make it impossible to design solutions --
just that the solutions are going to hard to explain, and hard to
design to be maximally useful.

The granularity also interferes with project mgt/business rules.
Tracking of changes as they propogate between branches and out into
deliverables has meaning well beyond the scope of low-level version
control -- it's part of how progress is monitored and part of how bug
archaeology is conducted. But with fine-grain patching, when you ask
"is change such and such present in this tree", (a) instead of "yes"
or "no" the answer can be "partly"; (b) computing that yes/no/partly
answer accurately involves searching the svm:mr-ish properties of
every node in the tree.

And of course, it isn't easy to implement that mechanism just for the
simple reason that it involves adding new features to your
database-schema, and surfacing APIs at multiple layers to deal with
those new properties, then writing custom merge tools that consult
those properties using those APIs.

A complementary and simpler to implement approach, that also simply
avoids the granularity issues listed above, is to take the route
we talked about earlier: add some higher-level structure to the
layout of the top-level organizational directories in a repository,
and some conventional structure to project trees, and add a layer
which does whole-project-tree merging. Merge history for
whole-project-tree merging doesn't have to be recorded as new
properties: it can be stored in ordinary (controlled) text files
along side the project trees themselves. Tools which perform merges
using and extending that history barely need to interact with svn at
all -- and can conceivably do so using only the ordinary CLI interface
or a scripting language binding for that interface.

You could have history-sensitive whole-tree merging, _and_ a big
chunk of what's needed for distributed branching, with maybe a couple
10K lines of python or scheme code.

The "patch log" mechanism in arch records whole tree merge history
exactly as I've suggested: in ordinary "source" files stored along
with the project tree itself. In the context of arch, it
demonstrates its usefulness for both smart merging and
project-mgt/business rule auditting. And, by keeping the data in
regular files (and representing the data carefully) -- the patch log
mechanism helps make distributed branching a snap.

To maybe give you a taste of how nicely this mechanism works out, in
the first paragraph I talked about the "transitivity" problem with
svn:mr-ish approaches. The patch-log mechanism solves that problem
automatically -- it "falls out" of the solution for free: a generic
whole-tree "patch" algorithm doesn't know anything special about patch
logs, it treats them just like regular files. But in treating patch
logs as ordinary files, it implicitly records the transitive merge
history just as part of its default operation (i.e., a merged-into
tree picks up all the patch log entries its missing from the
merged-in changes.)

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Apr 10 22:08:57 2003

This message: [ Message body ]
Next message: Greg Stein: "Re: svndumpfilter - rfc."
Previous message: cmpilato_at_collab.net: "Re: svn.collab.net down for maintenance"
In reply to: Karl Fogel: "Re: isn't variance adjusted patching horribly dangerous?"
Next in thread: Wolf Josef: "RE: isn't variance adjusted patching horribly dangerous?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]