Zack Weinberg <zack@codesourcery.com> writes:
> On Wed, Jan 23, 2002 at 09:16:55PM -0600, Ben Collins-Sussman wrote:
> > It worked for me 20 minutes ago. I tested the link myself.
> > Oh lookee. Our nightly Berkeley DB freeze.
> > OK, ran db_recover again, took 18 minutes. Should be all fixed.
>
> I'm still getting the error. Here's more detail:
What the *heck*?!? I just fixed the db not five minutes ago and
re-tested the link, and now it's broken again.
Hmmmm, here's a clue: 'ps ax' reveals some script trying to run
db_archive out of /usr/local/BerkeleyDB3.3/ !! Wait, looking at our
post-commit backup script... I see that the path is hard-coded into
the tools script. Oh jeez. I wonder if this was the problem.
Sorry to delay you, Zack. Here's a copy of the doc while we work out
the server bugs:
----------------------------------------------------------------------
SUBVERSION "MERGE" and "SWITCH" FEATURES
Slated for 0.9 (M9)
1st draft writ by Karl & Ben,
after much discussion with CMike & Greg.
This is primarily a description of the semantics of merge and switch,
that is, Subversion's user-visible behavior in these operations. It
also discusses some implementation issues.
Definitions:
* Merging is like "cvs update -j -j". I.e., take the difference
between two trees in the repository, and apply it diffily to the
working copy.
* Switching means to switch the working copy from one line of
development over to another, like "cvs update -r <TAG|BRANCH>".
Of course, Subversion doesn't really have the concept of lines
of development, it just has copies. But if a working directory
is based on repository tree T, and you "switch" it to be based on
repository tree S, where T and S are similar (related) in some
way, that's effectively the same as what CVS does.
The General Theory of Updating, Merging, and Switching
======================================================
Updating, merging, and switching are all very similar operations; each
command is a request to have the server modify the working copy in
some way. Each of these subcommands begins with the client describing
the "state" of the working copy to the server, and ends with the
server comparing trees and sending back tree-delta(s) to the client.
Here's the easiest way to understand the three operations: assume that
X:PATH1 and Y:PATH2 are paths within two repository revisions X and Y,
which are possibly the same revision. The server compares the X:PATH1
and Y:PATH2 and sends the difference to the client.
* In an update, PATH1 == PATH2 always, and after the tree-delta is
applied, the working copy metadata is changed (specifically,
revisions are bumped.)
* In a merge, PATH1 does not necessarily equal PATH2, and we don't
touch metadata (except maybe for "genetic" merging properties
someday). In other words, the applied changes end up looking like
local modifications.
* In a switch, PATH1 does not necessarily equal PATH2, and we *do*
rewrite the working copy metadata (specifically, revisions are
bumped and URLs are changed).
When doing a merge or switch, the user needs to specify at least one
of the two paths. There's a risk that the requested path may be
completely unrelated to the path represented by the working copy --
and thus might result in seemingly random diffs and conflicts
everywhere (or in the worst case, a complete deletion and re-checkout
of the working copy!) Our plan is to add a heuristic to Subversion
that asks the question "are these two paths related in some way?" If
the test fails, the command aborts and the user receives a friendly
message: "PATH1 and PATH2 have no common ancestry. Are you *sure*
you want to apply this delta? If so, re-run the command with the
--force option."
Merging
=======
Merge is a special case of update, or rather, update is a special case
of merge. Simplifying things a bit: when we update, we take the
differences between path P at revision X versus P at revision Y, and
apply that difference to the working copy. Note that since P:X
reflects the working copy text bases exactly, the server can send
contextless diffs to bring the working copy to P:Y. (The
simplification here is that P:X is really a transaction reflecting the
working copy's revision mixture, and not necessarily corresponding
precisely to any single revision tree).
When we merge, we take the differences between path P at revision X
(X:P) versus path Q at revision Y (Y:Q), and apply them to the working
copy.
Thus, what distinguishes a merge from an update is that P != Q (is
there a symbol for "need not equal"? Maybe "P ?= Q"...) For that
matter, X ?= Y.
X:P and Y:Q are two distinct trees, but in practice, they share a
common ancestor, so using the difference between them is not a
ridiculous idea. But note that svn_repos_dir_delta() is perfectly
content to express the difference between any two trees, related or
not.
It is possible, indeed likely, that neither P:X nor Q:Y are an exact
reflection of the working copy bases, therefore context diffs are used
to facilitate merging.
*** Implementation details ***
And as a rule, only the Subversion client generates context diffs and
applies them (right now by running 'diff' and 'patch' externally.)
Therefore, the objective is to create *two* sets of fulltext files in
some client-side temporary area. The first fulltext set represents
X:P, and the second fulltext set represents Y:Q. The client then
compares the two sets, generates context diffs, and applies the
context diffs to the working copy's working files.
The naive approach would be to just directly ask the server for both
sets of fulltexts. (We still consider this an option!)
A more complex approach (which we'll attempt) is a network
optimization -- it's a way of creating both sets of fulltexts on the
client using minimal network traffic:
* The client builds a transaction on the server that is a
"reflection" of the working-copy, mixed revisions and all.
* The server sends a tree-delta between the reflection and X:P; the
client then applies these binary diffs to copies of the
working-copy's text-bases in order to reconstruct the fulltexts
of X:P.
* The server sends a tree-delta between X:P and Y:Q; the client
then applies these binary diffs to copies of the X:P fulltexts in
order to reconstruct the fulltexts of Y:Q.
And that's it! We have both sets of fulltexts. The client generates
context diffs between them and patches the working copy.
As mentioned earlier, this process doesn't touch any working-copy
metadata in .svn/. Only the working files are patched, so the
differences appear as local modifications. At that point, the user
manually resolves any conflicts.
Switching
=========
Switching is a more general case of update: instead of comparing the
working-copy "reflection" to an identical path in some revision, the
server compares the reflection to some *arbitrary* path in some
revision. The user specifies the new path.
The result of the operation is to effectively morph the working copy
into representing a different location in the tree. In theory, there
should be no way to tell the difference between a fresh checkout of
PATH2 and a working copy that was "switched" to PATH2.
*** Implementation details ***
As in update operations, the client begins by building a reflection of
working-copy state on the server. The client then specifies a new
path/revision pair as the target of the tree-delta.
After the client finishes applying the delta, it needs to do a little
more work than update: besides bumping all working revisions to some
uniform value, it needs to rewrite all of the metadata URL ancestry as
well.
-----------------------------------------------------------------------
Interactions: A Brave New World
================================
With the `svn switch' feature, we now have the potential to have
working copies with "disjoint" subdirs, that is, subdirs whose
repository url is not simply the subdir's parent's url plus the
subdir's entry name in the parent. For example:
$ svn checkout http://svn.collab.net/repos/trunk -d svn
A ...
A svn/subversion/libsvn_wc
A svn/subversion/libsvn_fs
A svn/subversion/libsvn_repos
A svn/subversion/libsvn_delta
A ...
$ cd svn/subversion/libsvn_fs
$ svn switch http://svn.collab.net/repos/branches/blue/subversion/libsvn_fs
[...]
$
While svn/subversion/.svn/entries still has an entry for "libsvn_fs",
if you go into libsvn_fs and look at its own directory url, it is not
simply a child of the `subversion' directory url, but rather a
completely different url. We call this directory "disjoint".
Commits, updates, merges, and further switch commands all need to deal
sanely with this scenario.
We can assume that even disjoint urls are still all within the same
repository, because the parent of a disjoint child still has an entry
for that child, and all working copy walks are guided by entries. In
cases where there are wc subdirs from completely different
repositories, there is unlikely to be such entry linkage. [NOTE: We
will still be adding some extra information to the wc to make it
possible to check for the rare circumstance where the parent has an
entry for a subdir which (for whatever reason) is the result of a
checkout from a different repository. More on that later.]
Changes To The Commit Process:
==============================
Currently, the commit editor driver crawls the working copy, and sends
local modifications through the editor as it finds them. But we now
have to deal with disjoint urls in the working copy. Because editors
must be driven depth-first, we cannot send changes against these
disjoint urls as they are found -- instead, we must begin the edit
based on a common parent of all the urls involved in the commit. So
we must do a preliminary scan of the working copy, discovering all
local mods, collecting the urls for the mods, and then calculating the
common path prefix on which to base the edit.
[NOTE: this increases the memory usage of commits by a small amount.
We formerly interleaved the discovering and sending of local mods, but
now discovery will happen first and produce a list of changed paths,
and then sending the changes will happen entirely after that. The
benefit is that we preserve commit atomicity even when branches are
present in the working copy... which is very important!]
Changes To The Update Process:
==============================
Currently, update builds a reflection of the working copy's state on
the server (the reflection is a Subversion transaction). Then the
server sends back a tree delta between the reflection and the desired
revision tree (usually the head revision, but whatever). The tree
delta is expressed by driving an svn_delta_edit_fns_t editor on the
client side.
If there are disjoint subdirs in the working copy, the reflection
must, uh, reflect this. That's pretty easy: that subtree of the
transaction will simply point to the appropriate revision subtree
(implementation note: we'll need to add a new function to
svn_ra_reporter_t, allowing us to link arbitrary path/rev pairs into
the transaction.)
But getting the reflection right isn't enough. The revision tree
we're comparing the reflection with doesn't have the special disjoint
subtree, so a lot of spurious differences would be sent to the client,
which the client would then have to ignore, presumably making a
separate connection later to update the disjoint subdir. This way
lies madness... or at least inefficiency.
So instead, we'll create a *second*seaoe transaction, representing the
target of the update. In the plain update case, this transaction is
an exact copy of the revision (and perhaps we'll optimize out the txn
and just use the revision tree after all). But in the disjoint subdir
case, this second txn will also reflect the disjointedness. In other
words, when a disjoint directory D is discovered, it will be linked
into both txn trees -- in the reflection txn, D will be at whatever
revision(s) it is in the working copy, and in the target txn, it will
be at the target revision of the update. This way, the delta between
the reflection and target txns will apply cleanly to the working copy
(i.e., svn_repos_dir_delta() will just Do The Right Thing when invoked
on the two txns). Voila.
Changes to Switch and Merge Process:
====================================
The switch process still needs to build a working-copy reflection that
contains possible "disjointed" subtrees. However, the second
target-transaction isn't needed at all. The server can send a delta
between the reflection and a "pure" path in some revision (presumably
the path that we're switching to.)
If the disjointed subtree and the target path both happen to be part
of the same branch, then svn_repos_dir_delta() won't notice any
differences at all. Otherwise, the user should expect to have the
disjointed section of the working copy be "converted" to a new URL,
just like the rest of the working copy.
In the case of merges, we continue to build a reflection that contains
disjointed subtrees. Again, no need for a second transaction.
Remember that the reflection is only being built as a shortcut to
cheaply construct fulltexts of X:P in the client. The structure of
the reflection is irrelevant; *any* reflection can be used as a basis
for sending a tree-delta that constructs X:P, no matter what
disjointed sections it has. (Although some reflections may be more
useful than others! In the worst case, if the reflection is
completely unrelated to X:P, then svn_repos_dir_delta() regresses into
sending fulltexts anyway.)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:58 2006