[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Changesets vs Wet Blankets

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2003-04-15 20:59:31 CEST

On Tue, 2003-04-15 at 14:12, Tom Lord wrote:
> As a practical matter: it's one repo to admin instead of N, and one
> repo for users to point their clients at instead of N.

We make it pretty much invisible to the UI whether they are accessing
one repository or N. (And in all the ways it is visible, such as
independent revnums and non-atomic commits, project directories would do
the same thing.)

And it's not clear to me that adminstering N repositories needs to be
any more work than administering one repository with N independent
project directories.

> As a semantic matter: Right! The repository boundaries don't matter
> so much when you don't have global serialization.

So why should we implement project directories when we can just tell
people to put different projects in different repositories instead?

> Do we really have to solve the hard problem of detecting conflicting
> concurrent commits within a given project tree development line? Not
> obviously. [...] Suppose, then, that we impose a rule that commits
> can only take place when the project tree wc is up-to-date

Even with that rule, there will always be a race condition, requiring
some kind of lock. (What you could do, though, is discard the automerge
mechanism. But I don't think that would fly. So, moving on...)

> But let's assume that answer doesn't fly. [...] I can more often use
> "brute force" techniques and still get decent performance.

I don't think the scale really changes that much. I'm not sure what
kind of svn repositories you're imagining that are going to be so much
huger than projects. Do you think Sourceforge is planning to have one
huge svn repository for every project? Do you think the *BSDs would be
happy with a separate project directory for each library and command?
(I don't think they would be, because they want their libraries and
commands organized in a namespace of their determination, and because
there are a significant number of changes which would span projects and
still want to be atomic.)

> Svn log can tell you which commits effected the tree -- but to
> actually gather up the changes associated with those commits you have
> to do that pruned-tree-walk, issuing separate queries for each changed
> node in each commit.

But we have to implement that anyway, in order to be able to "svn log" a
subdirectory of a project. (I guess you're saying we would do that on
the client side, by getting all the project's changelogs and pruning
them down. See next comment.)

> I don't know why you'd want to backup or rebuild a cache -- you'd just
> populate it on-demand. Acesss patterns to historic revisions is going
> to vary wildly over time. There's no good reason to maintain and
> back-up a skip-delta record that optimizes all access equally. Just
> the opposite: space-bound the skip-delta cache and let it optimize in
> response to use.

This, and your previous suggestion, fail the "guaranteed acceptable
performance" property. (I just invented that name, but I've believed in
the idea for a long time.) It's not okay to say, "you want rev 100 of
foo.c? Uh, damn, you haven't asked for anything like that recently, I'm
going to have to sort through a few thousand project changelogs, and
I'll get back to you in a few hours."

> > You'll only have 100 to 5000 times as many compartments if each
> > changeset affects 100 to 5000 nodes.
> You're counting skip-delta datums, I'm counting skip-delta indexes.

I don't see why keeping the index space small is so important.

> GCC, a very busy project with O(100) authorized committers, gets less
> than 70 commits a day during the busy periods (quite a bit less --
> that's a very conservative estimate with a safety margin thrown in).
> Nearly all of those commits modify just a very few files.

Skip-deltas reduce access times by having some deltas which span a large
number of revisions. So, each commit may be very small, but some of the
deltas will span lots of commits, and would thus cover lots of files.

So I think it's really necessary to do skip-deltas on a file basis, not
on a tree basis.

> Not really but, more to the point, simple client-side caching
> mechanisms would solve the same problem as restartable checkouts very
> simply.

Client-side caching only helps if the client has done something similar
in the past. We want to be able to check out the svn repository for the
very first time, lose network in the middle, and not have to start over
again. Anything else is embarassing.

> Perhaps. Supporting that certainly adds code to the system and I'm
> not too sure about the cost/benefit trade-off there.

Well, you haven't been very forthcoming that you were planning to slash
this functionality in svn in the process of turning it into arch. I
think for us it's a required feature.

>> [ People want to use SVN as a backing store for a DAV FS.]

> Which is another example of why I think reconceptualizing the product
> as a txnal fs db and a layer over that to do revision control makes sense.

But these people want revision control on that backing store too!

> While a txnal fs db may be presented as a network filesystem, that's
> hardly it's most interesting property. Rather, I think it's
> interesting as a data store with ACID txns and a data model that
> subsumes RDBMSs and OODBs into the more flexible and easier to work
> with model of a unix-ish fs.

Do you have any evidence that this is in the "not too distant future"
besides you personally finding it interesting?

> Ok. "Going over someone's head" is an invocation of authority, hence
> force. It is an attempt to override someone's volition. It's an
> I-win-you-lose scenario.

I'm not personally invested in whether you succeed or fail. I'm just
saying, CollabNet has deliberately left the inmates in control of the
asylum, so they don't exist as a fallback. You may consider that an
error in judgment on their part, but it's unlikely to change.

> Well, to site a small example: I thought you did cop to getting some
> benefit from my original talking about app-level journalling instead
> of BDB-logs -- and that's lifted right out of arch and passed through
> some filters to express it as it applies with fewest contingencies to
> the svn world.

It's a good idea, but you certainly didn't invent app-level journaling
when you wrote arch. (Anyway, this line of argument isn't going leading

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Apr 15 21:00:26 2003

This is an archived mail posted to the Subversion Dev mailing list.