Some mail exchanger was down temporarily, and a bunch of my mails
reached this list long (day or two) after they'd been sent. So if
you're confused at my sense of timing, that's why. :-)
-K
Karl Fogel <kfogel@newton.ch.collab.net> writes:
> I have an appointment and hence can't debug this error right now
>
> $ svn ci notes/merging-and-switching.txt
> svn_error: #21075 : <RA layer failed hostname lookup>
> Hostname not found: (null)
> $
>
> So here's the new version of the merging-and-switching.txt document
> that I was about to commit. Mainly it adds to the section on
> implementing merge:
>
> ------------------------------------------------------------------------
>
> SUBVERSION "MERGE" and "SWITCH" FEATURES
> Slated for 0.9 (M9)
>
> 1st draft writ by Karl & Ben,
> after much discussion with CMike & Greg.
>
>
> This is primarily a description of the semantics of merge and switch,
> that is, Subversion's user-visible behavior in these operations. It
> also discusses some implementation issues.
>
> Definitions:
>
> * Merging is like "cvs update -j -j". I.e., take the difference
> between two trees in the repository, and apply it diffily to the
> working copy.
>
> * Switching means to switch the working copy from one line of
> development over to another, like "cvs update -r <TAG|BRANCH>".
> Of course, Subversion doesn't really have the concept of lines
> of development, it just has copies. But if a working directory
> is based on repository tree T, and you "switch" it to be based on
> repository tree S, where T and S are similar (related) in some
> way, that's effectively the same as what CVS does.
>
>
> The General Theory of Updating, Merging, and Switching
> ======================================================
>
> Updating, merging, and switching are all very similar operations; each
> command is a request to have the server modify the working copy in
> some way. Each of these subcommands begins with the client describing
> the "state" of the working copy to the server, and ends with the
> server comparing trees and sending back tree-delta(s) to the client.
>
> Here's the easiest way to understand the three operations: assume that
> X:PATH1 and Y:PATH2 are paths within two repository revisions X and Y,
> which are possibly the same revision. The server compares the X:PATH1
> and Y:PATH2 and sends the difference to the client.
>
> * In an update, PATH1 == PATH2 always, and after the tree-delta is
> applied, the working copy metadata is changed (specifically,
> revisions are bumped.)
>
> * In a merge, PATH1 does not necessarily equal PATH2, and we don't
> touch metadata (except maybe for "genetic" merging properties
> someday). In other words, the applied changes end up looking like
> local modifications.
>
> [Actually, in a merge PATH1 usually does equal PATH2 -- in fact,
> that's how it always is in CVS, in a sense. So I think
> supporting the PATH1 != PATH2 case in merge should not be a high
> priority. -kff]
>
> * In a switch, PATH1 does not necessarily equal PATH2, and we *do*
> rewrite the working copy metadata (specifically, revisions are
> bumped and URLs are changed).
>
> When doing a merge or switch, the user needs to specify at least one
> of the two paths. There's a risk that the requested path may be
> completely unrelated to the path represented by the working copy --
> and thus might result in seemingly random diffs and conflicts
> everywhere (or in the worst case, a complete deletion and re-checkout
> of the working copy!) Our plan is to add a heuristic to Subversion
> that asks the question "are these two paths related in some way?" If
> the test fails, the command aborts and the user receives a friendly
> message: "PATH1 and PATH2 have no common ancestry. Are you *sure*
> you want to apply this delta? If so, re-run the command with the
> --force option."
>
>
> Merging
> =======
>
> Merge is a special case of update, or rather, update is a special case
> of merge. Simplifying things a bit: when we update, we take the
> differences between path P at revision X versus P at revision Y, and
> apply that difference to the working copy. Note that since P:X
> reflects the working copy text bases exactly, the server can send
> contextless diffs to bring the working copy to P:Y. (The
> simplification here is that P:X is really a transaction reflecting the
> working copy's revision mixture, and not necessarily corresponding
> precisely to any single revision tree).
>
> When we merge, we take the differences between path P at revision X
> (X:P) versus path Q at revision Y (Y:Q), and apply them to the working
> copy.
>
> Thus, what distinguishes a merge from an update is that P != Q (is
> there a symbol for "need not equal"? Maybe "P ?= Q"...) For that
> matter, X ?= Y.
>
> X:P and Y:Q are two distinct trees, but in practice, they share a
> common ancestor, so using the difference between them is not a
> ridiculous idea. But note that svn_repos_dir_delta() is perfectly
> content to express the difference between any two trees, related or
> not.
>
> It is possible, indeed likely, that neither P:X nor Q:Y are an exact
> reflection of the working copy bases, therefore context diffs are used
> to facilitate merging.
>
> *** Implementation details ***
>
> Heh, two completely different possibilities here:
>
> 1. Only the Subversion client generates context diffs and applies them
> (right now by running 'diff' and 'patch' externally.) Therefore,
> the objective is to create *two* sets of fulltext files in some
> client-side temporary area. The first fulltext set represents X:P,
> and the second fulltext set represents Y:Q. The client then
> compares the two sets, generates context diffs, and applies the
> context diffs to the working copy's working files.
>
> The naive approach would be to just directly ask the server for
> both sets of fulltexts. (We still consider this an option!)
>
> A more complex approach (which we'll attempt) is a network
> optimization -- it's a way of creating both sets of fulltexts on
> the client using minimal network traffic:
>
> * The client builds a transaction on the server that is a
> "reflection" of the working-copy, mixed revisions and all.
>
> * The server sends a tree-delta between the reflection and X:P;
> the client then applies these binary diffs to copies of the
> working-copy's text-bases in order to reconstruct the fulltexts
> of X:P.
>
> * The server sends a tree-delta between X:P and Y:Q; the client
> then applies these binary diffs to copies of the X:P fulltexts
> in order to reconstruct the fulltexts of Y:Q.
>
> And that's it! We have both sets of fulltexts. The client
> generates context diffs between them and patches the working copy.
>
> As mentioned earlier, this process doesn't touch any working-copy
> metadata in .svn/. Only the working files are patched, so the
> differences appear as local modifications. At that point, the user
> manually resolves any conflicts.
>
>
> 2. What is the difference between these two commands?
>
> svn merge -rX:Y <URL>
> svn diff -rX:Y <URL> | patch
>
> :-) ? If we have an extended patch format, supporting copies,
> renames, deletes, and properties (like we've been planning), then
> there isn't formally even any need for a "merge" command -- it's a
> trivial wrapper around "svn diff" and patch.
>
> In other words, much of the work described in Plan 1 above has
> already been done by Philip Martin in his diff editors. Maybe we
> should just take advantage of that? There's still the issue of
> recording metadata about the merge, but presumably that would come
> from the extended patch format.
>
> Random thoughts from Karl:
>
> I do wonder if it's always desirable to merge properties anyway.
> Most of the properties we have are subversion-specific, and when I
> think of the kinds of merges I've done in the past, I can't think
> of a case where having the property changes merge would be
> desirable. Ooooh, but when we use the properties to record what
> has been previously merged, then having them travel *with* the
> changes is useful. For example:
>
> $ svn merge -r18:20 http://svn.collab.net/repos/branches/rel_1
> $ svn ci
> ===> produces .../trunk/whatever/blah, revision 100
>
> Then the next week:
>
> $ svn switch http://svn.collab.net/repos/branches/rel_2
> $ svn merge -r97:153 http://svn.collab.net/repos/trunk/whatever/blah
>
> In a situation like that, you want the rel_1 branch merge into
> trunk to travel with the trunk changes you're now merging into
> rel_2.
>
>
> Switching
> =========
>
> Switching is a more general case of update: instead of comparing the
> working-copy "reflection" to an identical path in some revision, the
> server compares the reflection to some *arbitrary* path in some
> revision. The user specifies the new path.
>
> The result of the operation is to effectively morph the working copy
> into representing a different location in the tree. In theory, there
> should be no way to tell the difference between a fresh checkout of
> PATH2 and a working copy that was "switched" to PATH2.
>
> *** Implementation details ***
>
> As in update operations, the client begins by building a reflection of
> working-copy state on the server. The client then specifies a new
> path/revision pair as the target of the tree-delta.
>
> After the client finishes applying the delta, it needs to do a little
> more work than update: besides bumping all working revisions to some
> uniform value, it needs to rewrite all of the metadata URL ancestry as
> well.
>
> -----------------------------------------------------------------------
>
>
> Interactions: A Brave New World
> ================================
>
> With the `svn switch' feature, we now have the potential to have
> working copies with "disjoint" subdirs, that is, subdirs whose
> repository url is not simply the subdir's parent's url plus the
> subdir's entry name in the parent. For example:
>
> $ svn checkout http://svn.collab.net/repos/trunk -d svn
> A ...
> A svn/subversion/libsvn_wc
> A svn/subversion/libsvn_fs
> A svn/subversion/libsvn_repos
> A svn/subversion/libsvn_delta
> A ...
> $ cd svn/subversion/libsvn_fs
> $ svn switch http://svn.collab.net/repos/branches/blue/subversion/libsvn_fs
> [...]
> $
>
> While svn/subversion/.svn/entries still has an entry for "libsvn_fs",
> if you go into libsvn_fs and look at its own directory url, it is not
> simply a child of the `subversion' directory url, but rather a
> completely different url. We call this directory "disjoint".
>
> Commits, updates, merges, and further switch commands all need to deal
> sanely with this scenario.
>
> We can assume that even disjoint urls are still all within the same
> repository, because the parent of a disjoint child still has an entry
> for that child, and all working copy walks are guided by entries. In
> cases where there are wc subdirs from completely different
> repositories, there is unlikely to be such entry linkage. [NOTE: We
> will still be adding some extra information to the wc to make it
> possible to check for the rare circumstance where the parent has an
> entry for a subdir which (for whatever reason) is the result of a
> checkout from a different repository. More on that later.]
>
>
> Changes To The Commit Process:
> ==============================
>
> Currently, the commit editor driver crawls the working copy, and sends
> local modifications through the editor as it finds them. But we now
> have to deal with disjoint urls in the working copy. Because editors
> must be driven depth-first, we cannot send changes against these
> disjoint urls as they are found -- instead, we must begin the edit
> based on a common parent of all the urls involved in the commit. So
> we must do a preliminary scan of the working copy, discovering all
> local mods, collecting the urls for the mods, and then calculating the
> common path prefix on which to base the edit.
>
> [NOTE: this increases the memory usage of commits by a small amount.
> We formerly interleaved the discovering and sending of local mods, but
> now discovery will happen first and produce a list of changed paths,
> and then sending the changes will happen entirely after that. The
> benefit is that we preserve commit atomicity even when branches are
> present in the working copy... which is very important!]
>
>
> Changes To The Update Process:
> ==============================
>
> Currently, update builds a reflection of the working copy's state on
> the server (the reflection is a Subversion transaction). Then the
> server sends back a tree delta between the reflection and the desired
> revision tree (usually the head revision, but whatever). The tree
> delta is expressed by driving an svn_delta_edit_fns_t editor on the
> client side.
>
> If there are disjoint subdirs in the working copy, the reflection
> must, uh, reflect this. That's pretty easy: that subtree of the
> transaction will simply point to the appropriate revision subtree
> (implementation note: we'll need to add a new function to
> svn_ra_reporter_t, allowing us to link arbitrary path/rev pairs into
> the transaction.)
>
> But getting the reflection right isn't enough. The revision tree
> we're comparing the reflection with doesn't have the special disjoint
> subtree, so a lot of spurious differences would be sent to the client,
> which the client would then have to ignore, presumably making a
> separate connection later to update the disjoint subdir. This way
> lies madness... or at least inefficiency.
>
> So instead, we'll create a *second*seaoe transaction, representing the
> target of the update. In the plain update case, this transaction is
> an exact copy of the revision (and perhaps we'll optimize out the txn
> and just use the revision tree after all). But in the disjoint subdir
> case, this second txn will also reflect the disjointedness. In other
> words, when a disjoint directory D is discovered, it will be linked
> into both txn trees -- in the reflection txn, D will be at whatever
> revision(s) it is in the working copy, and in the target txn, it will
> be at the target revision of the update. This way, the delta between
> the reflection and target txns will apply cleanly to the working copy
> (i.e., svn_repos_dir_delta() will just Do The Right Thing when invoked
> on the two txns). Voila.
>
>
> Changes to Switch and Merge Process:
> ====================================
>
> The switch process still needs to build a working-copy reflection that
> contains possible "disjointed" subtrees. However, the second
> target-transaction isn't needed at all. The server can send a delta
> between the reflection and a "pure" path in some revision (presumably
> the path that we're switching to.)
>
> If the disjointed subtree and the target path both happen to be part
> of the same branch, then svn_repos_dir_delta() won't notice any
> differences at all. Otherwise, the user should expect to have the
> disjointed section of the working copy be "converted" to a new URL,
> just like the rest of the working copy.
>
> In the case of merges, we continue to build a reflection that contains
> disjointed subtrees. Again, no need for a second transaction.
> Remember that the reflection is only being built as a shortcut to
> cheaply construct fulltexts of X:P in the client. The structure of
> the reflection is irrelevant; *any* reflection can be used as a basis
> for sending a tree-delta that constructs X:P, no matter what
> disjointed sections it has. (Although some reflections may be more
> useful than others! In the worst case, if the reflection is
> completely unrelated to X:P, then svn_repos_dir_delta() regresses into
> sending fulltexts anyway.)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:36:59 2006