Re: Why rewrite the Subversion working copy?

From: David Glasser <glasser_at_davidglasser.net>
Date: Wed, 23 Apr 2008 10:29:05 -0700

On Wed, Apr 23, 2008 at 9:56 AM, John Peacock
<john.peacock_at_havurah-software.org> wrote:
> David Glasser wrote:
>
> > For the lowest layer, I've started implementing a prototype in Python
> > of a low-level Subversion metadata store, with full unit tests and all
> > that jazz. Currently I've designed the API for a refcounted blobstore
> > and am working on tree and treestore abstractions. I'll put it
> > somewhere (http://svn.collab.net/repos/svn/experimental/svnws? gvn
> > repository? whatever) sometime soon, once I've got a little more
> > implemented. The key goal here is to create a *non-brittle* working
> > copy, where we don't have to be scared that typing "svn switch" with
> > the wrong URL will corrupt the working copy irretrievably. Efficiency
> > is nice too. (Hopefully, while the code itself will probably need to
> > be backported to C, the tests might end up being executable against
> > the "real" code.)
> >
>
> I've been actually wondering for a while why Subversion couldn't just use
> the existing server repo code for the client meta-data. A mini local repo
> of configurable depth (up to and including a full mirror). Updates would use
> svnsync-like operations to update the local repo before updating the working
> copy itself. Revs outside of the configured window would get purged
> automatically (or a config option could let them just build up over time).
> A zero-depth local repo would require server access for all operations and
> would still have just the properties and checksums locally (equivalent to
> all of the file contents forced to zero size), so enormous checkouts would
> not be 2x as large any longer.
>
> All operations would be performed against the local "repo" when possible
> (i.e. if you are disconnected, commits go against the local repo and are
> resolved when reconnected) and against the server when required (you only
> have the last 10 revs locally, so asking for a full history requires a
> server roundtrip, just like it does now).
>
> Each local metadata repo would effectively be a branch of the main repo (so
> multiple remote repos would have multiple local branch/mirrors), and the
> existing merge-tracking code can be reused to facilitate conflict resolution
> caused by offline operations.
>
> I've got some handwritten notes at home about some of the other
> implications of this, but my home machine is in between operating systems at
> the moment, so I've been very much out of the loop for a while. Is this
> something that I should try and flesh out into a more robust proposal???

There's good ideas there, and I'd be interested in reading a more
robust proposal. I don't think that just repurposing our repo code to
the wc is going to work, though; we'd need to make big enough changes
that it's unclear to me that what is shared is worth it. The big
difference is the "keep around only the last ten revs" stage. Both
the DAG structure of the repository metadata and the delta structure
of text contents make it pretty hard to say "this bit of data here is
old and can be thrown out", because newer data may depend on it.
Also, the BDB FS isn't a great choice for wc, because of the
filesystem support issues that prompted FSFS in the first place; and
the super-immutable nodes-are-named-by-file-offset property of FSFS
makes reorganizing old revisions difficult. Somebody else mentioned
offline log and blame; however, offline log means "syncing unversioned
revprops", which (while certainly feasible) is its own barrel of fish;
and offline blame really does require having the fulltexts.

Don't get me wrong; using FS code as a basis for the new workspace
could be a good idea. I've considered it too, and perhaps there are
good solutions to the problems I'm seeing. But I just think that the
purposes of the seemingly-similar structures are very different. Our
repository is based around storing a very large number of similar
immutable trees (multiple revisions and multiple branches). The
working copy only needs to store a smaller number of mutable trees; I
think optimizing for the latter scenario is going to be more of a win.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org

Received on 2008-04-23 19:29:50 CEST

This message: [ Message body ]
Next message: Uli Luckas: "Issue #1516: 'svn cp' cannot replace schedule delete"
Previous message: John Peacock: "Re: Why rewrite the Subversion working copy?"
In reply to: John Peacock: "Re: Why rewrite the Subversion working copy?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]