[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PROPOSAL] Merging Improved

From: Tom Lord <lord_at_emf.net>
Date: 2003-04-14 04:09:10 CEST

> From: Greg Hudson <ghudson@MIT.EDU>

> On Sun, 2003-04-13 at 12:29, Tom Lord wrote:
>> And, hmm.... just as an aside. I woke up this morning realizing that
>> if you had first class project trees in svn, suddenly a fast,
>> space-efficient, native-fs storage manager would be a lot more
>> practical (i.e., no need for BDB or a RDMS). Not trivial -- but
>> tractable.

> Explain?

Heh -- I like that question.

Well, briefly:

User-defined project boundaries give you an excellent hint for the
scope of atomic commits (so good, in fact, that if you tweaked the
semantics to say atomic commits could not have larger scope it
wouldn't be a big loss -- but that's not absolutely necessary).

That commit scope both (a) does a pretty good job of trivially
partitioning concurrent commits into non-interfering subsets; (b)
partitions the the potentially huge fs namespace into tractably-sized
subtrees.

So now how does that translate into a simple native-fs storage manager
that satifies the performance characteristics we ideally expect from a
svn server?

Well, for write transactions, the first observation is that distinct
project subtrees within the namespace can be handled by
non-communicating servers or server threads/processes -- communication
is only necessary for those rare commits that span project-tree
boundaries (if, in fact, you want to support those).

The second observation is that a commit consists of generating a
changeset client side, sending it to the server, checking for
up-to-dateness, and assigning a repository revision number. An
application-level log of such txns, suitable to ensure ACID
properties, is essentially just a per-project-tree list of those
changesets -- a data structure that's fairly easy to implement on a
native-fs -- plus another list to assign the repository rev numbers.
(Here, you'd be better off without repository rev numbers -- letting
project trees rev asynchronously -- but either way, your talking about
ACID managment of write-only lists that just grow: pretty easy on a
native fs). (The arch native repository format is pretty close to
this as it stands -- though it lacks the global repository rev.)

But a "list of changesets" doesn't give you the access pattern
performance characteristics you want, so:

The third observation is that the various performance characteristics
we want can be built on-top of that basic lists-of-changesets
structure by caching and memoization of data about various revs. For
example, we'll want some supplementary fs data structures that
(approximately) cache head revisions or that build various indexes.
But on what should we key those caches, indexes, and memos? The
project-tree boundaries, because of the tractable size of the trees
they contain and their relationship to the atomicity of commits, are
ideal. (This is a little different from the BDB server which keys
whole-text head revisions on node-ids, rather than paths -- but I
think path-based access is what dominates the performance
expectation.) (Here I'd against point to arch -- specifically to the
"revision library" mechanism -- not because it's exactly what you'd
want in a svn server, but because it's close enough that I suspect you
can do the in-betweening yourself.)

Is that too brief?

Now let me add that, absent explicit project trees -- you could fake
them. You could make up a heuristic like "all third-level directories
count as project trees" -- but I think you'll eventually run into
problems with heuristics like that and that explicit project trees are
by far, the cleaner solution.

And I'll also add that repository revision numbers are lame and y'all
should be thinking in terms of getting rid of them (at least at the UI
level). You have a transactional fs with cheap cloning and that's all
you need -- you don't need to serialize changes to unrelated sections
of the fs. It would have been much wiser, a few years back, to
implement commits in terms of tree-copies, not fs revision numbers.
And it's never to late to plan a migration towards that.... The pivot
point is the recommended usage patterns decorated with labels that say
"this part is going to change."

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Apr 14 03:59:01 2003

This is an archived mail posted to the Subversion Dev mailing list.