Re: [PATCH] new feature: takeover

From: Jonathan Gilbert <o2w9gs702_at_sneakemail.com>
Date: 2005-07-08 00:06:12 CEST

At 10:16 PM 07/07/2005 +0100, Max Bowsher wrote:
>Jonathan Gilbert wrote:
>...
>>> Dunno what good this'll do, though.. Since at this point the patch is
>>> only a convenience thing and not yet a bandwidth saver, I doubt its
>>> ability to make it past your scrutiny. :-) Well, there it is, anyway.
>
>I'd like to register an enthusiastic +1 for this feature, even without any
>bandwidth optimization.
>
>I've wanted this personally at times, and would have liked to have
>recommended it to others on occasion too.
>
>The use case I remember is:
>
>Checking out your existing homedir-under-version-control onto a new machine,
>which potentially already has some content in the homedir.
>A simple checkout will run into obstructions. However, if you can 'underlay'
>the existing directory structure with .svn directories, you can then use
>update and diff to gradually bring the possibly disorganized home into
>agreement with your personal repository.

The problem with merely, as you put it, "underlaying" the source tree with
.svn dirs is that regardless of what is going on in your WC, the text-base
"pristine" copy must always be a complete, consistent revision from the
server. It doesn't matter which revision, but it must be exactly one, and
not a mix of two or only part of one. Putting files into the WC is actually
basically the *last* thing an "svn checkout" does. It proceeds roughly like
this:

1. Check for files which differ from the *current* text-base (for a
"checkout", the .svn subdir is initialized to revision 0 -- that is, the
empty revision that precedes the existence of source code). This check is
done because the code path is the same for "svn update"; it needs to know
which files it should simply overwrite with the copy from the server and
which it needs to merge changes into.

2. Tell the server "I have revision A. I want to have revision B." This
command (the name of the command in the protocol is "update", oddly enough
;-) causes the server to build a script for a so-called "editor". The
script consists of commands like "add this file", "delete that file",
"change this file's properties", "rename this file", "modify the contents
of this file according to this text-diff", etc. It is important to note
here that the command has only 2 parameters: the revision that the client
already has in text-base, and the revision the client wants to end up with
after running the editor script.

3. As it is received from the server, the script is applied to the pristine
files in .svn/text-base. The files are not overwritten; instead, the
modified versions are written out into .svn/tmp/text-base. This is so that
a failed operation doesn't put the existing text-base into an inconsistent
state. As each file is completed, commands are written to a journal listing
changes to be done against the non-'tmp' part of .svn.

4. Once the server has finished its "update" output, the journal, which is
a script in its own sense too, is checked for consistency, to make sure it
isn't going to screw things up. Then, it is applied. It is at this point
that new versions of files are moved from .svn/tmp/text-base into
.svn/text-base and such other related operations.

So, you can see how with this structure to the working copy and its ".svn"
subdirectories, doing a proper checkout is essential to maintain
consistency :-) You can also see how it would be difficult to prevent
redundant downloading with this structure. The server assumes the client
already has SOME revision of each file, even if it is revision zero (in
which the file does not exist, and the sequence of text-delta operations
must transfer the entire content of the file). I thought for a while
yesterday that it might be possible to ask the server individually for each
already-existing file's checksum, and to thereby only get the full text for
files that differed (or didn't exist yet). Though this would have been less
efficient, involving many synchronous RA calls in sequence, it would still
prevent the bulk of the data transfer, but it turns out it isn't actually
possible to get the file's checksum without having the server send the
file's content (or, more precisely, the last diff to the file before the
revision you want).

So basically, in order to stop the redundant downloading, new functionality
needs to be added to the RA layer, including all of the underlying protocol
providers. I do have plans at the moment to implement such functionality,
but it will take a while :-)

Anyway, I'm happy to see someone else who agrees with me that redoing a
working copy from scratch, including manually copying over edited and
unversioned files, is annoying and that even if it means re-downloading
everything from the server, automating the process is worthwhile :-)

Jonathan Gilbert

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Fri Jul 8 00:05:22 2005

This message: [ Message body ]
Next message: David Summers: "Re: bad subversion-1.2.0-1.rhel3.i386.rpm"
Previous message: Marcus Rueckert: "Re: Hanging up of svn and apache process while using username argument with svn command."
In reply to: Max Bowsher: "Re: [PATCH] new feature: takeover"
Next in thread: Norbert Unterberg: "Re: [PATCH] new feature: takeover"
Reply: Norbert Unterberg: "Re: [PATCH] new feature: takeover"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]