Re: Changesets vs Wet Blankets

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2003-04-15 08:11:47 CEST

On Mon, 2003-04-14 at 18:11, Tom Lord wrote:
> I pointed out that it is not unreasonable to use project trees as the
> granularity of atomic commits (e.g., for locking), and that it might
> even be reasonable to tweak the UI semantics by asserting that
> atomic commits are not guaranteed except _within_ individual project
> trees (a tweak that would further simplify implementation)

If commits are only atomic within a project, and each project has its
own isolated changeset journal, and its own cache of skip-deltas, how
does that differ from having each project in its own repository?

> If every commit to a journal has
> to share a single lock just to begin to sort out conflicting
> concurrent commits, that's an added implementation complexity.

That's not added implementation complexity, because whether or not we
have project trees, we have to consider the correctness of conflicting
commits within a project. It doesn't help implementation complexity to
divide the problem into simple cases and hard cases; you still have to
solve the hard cases.

> Beyond locking, having the journal pre-sorted by project tree
> development line means that for any query confined to a single
> project tree, I can find all of the relevent journal entries in an
> O(1) operation (just "look at the right part of the journal") rather
> than having to search the journal or rely on an external index of
> it.

So, right now you can "svn log" a directory to find the commits which
affect that directory or any file within it. Yes, we had to implement
that, but we had to do that anyway, because you need to be able to "svn
log" a file, with or without project directories.

> An example of such a query is a client asking: "is my wc of
> this project tree up to date?" -- that translates into "give me the
> list of journal entries for this project tree".

Again, we have to implement "is this file up to date?", and answering
that question for a directory isn't any harder.

> The skip delta cache can be a nicely orthogonal
> component with a procedural API and an implementation whose storage
> management issues are separate from everything else and nicely
> modularized.

We've talked about moving various things out of the database, such as
the strings table, but we've never done it because having everything
stored the same way simplifies backups. (The skip-delta cache may be
"just a cache" logically, but if it takes several days to rebuild it,
you have to back it up along with the changeset log.)

> Bringing project trees back in: they simplify things by allowing you
> to compute skip deltas for project trees, rather than individual
> nodes. If you index a skip-delta cache by project tree, you'll have
> N compartments of deltas; if you index by node or node/copy_id,
> you'll have N * K where, typically, K is in O([100...5000]).

You'll only have 100 to 5000 times as many compartments if each
changeset affects 100 to 5000 nodes.

> If a query needs deltas that pertain to a particular file within a
> project tree, for example, a trivial linear search of some
> delta-chain for that project tree is a practical solution -- given
> first-class project trees.

I don't think that's really practical for a project like gcc. You'd be
greatly expanding the amount of data you have to look at to get at a
particular file at some revision. Having small compartments does have
constant-factor penalties, but having large compartments is worse
because it changes the performance curve.

Also, if you did project tree skip-deltas, then for reasonable
efficiency, checking out a project tree would no longer be a
file-by-file operation, which would seem to complicate restartable
checkouts. And checking out a subdirectory of a project efficiently
sounds tricky.

> someone or other said (and I'm paraphrasing
> here, but just barely) "svn is _useless_ [for revision control]
> without that additional structure".

But at least one other person said they don't use that structure. And
there are people who want to use svn as a backing store for DAV
filesystems using auto-versioning, a use case which has no need for the
trunk/branches/tags structure. (The structure wouldn't badly hurt that
use case--you'd just put everything under "trunk"--but I think Aegis has
shown that your barrier to entry is higher if you have a lot of
structure which many users simply have to work around.)

> Beyond that -- while you may choose not to see svn's txnal fs
> scaling that way, I think you have to anticipate txnal fs' coming
> along that _do_ scale that way in the not too distant future.

Speaking as someone who has to make loads of free software work with a
network filesystem, I can tell you authoritatively that the world does
not give a damn about network filesystems, by and large.

> In reply (not just from Brane), I was told that I was "backsliding"
> (?!?) and accused of trying to (paraphrased) "Go over the heads of
> developers."

I didn't mean it as an accusation. (I've never quite understood why
"going over someone's head" is supposed to be a bad thing.) It just
didn't seem like it was going to work.

> To be slightly more explicit: on the one hand, I have a bunch of
> technology in arch that, as is starting to become clear in various
> branches of this discussion, is applicable to svn.

>From my corner, that only seems to be becoming clearer to you.

(Not that you have to convince me. I don't work for CollabNet and I'm
not in a position to increase your income.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Apr 15 08:12:39 2003

This message: [ Message body ]
Next message: Greg Stein: "Re: cvs2svn assertion"
Previous message: ryan: "Re: cvs2svn assertion"
In reply to: Tom Lord: "Re: Changesets vs Wet Blankets"
Next in thread: Tom Lord: "Re: Changesets vs Wet Blankets"
Reply: Tom Lord: "Re: Changesets vs Wet Blankets"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]