On Sun, 2003-04-13 at 22:09, Tom Lord wrote:
> Well, for write transactions, the first observation is that distinct
> project subtrees within the namespace can be handled by
> non-communicating servers or server threads/processes
They could, but that only affects maximum throughput; it doesn't help to
make the design simpler than what we have now, since we don't support
hosting different parts of a repository's namespace on different
> The second observation is that a commit consists of generating a
> changeset client side, sending it to the server, checking for
> up-to-dateness, and assigning a repository revision number. An
> application-level log of such txns, suitable to ensure ACID
> properties, is essentially just a per-project-tree list of those
> changesets -- a data structure that's fairly easy to implement on a
> native-fs -- plus another list to assign the repository rev numbers.
(Why would you need a separate list to assign the repository rev
numbers? Presumably the list of changesets has an order, and that could
correspond to the repository revisions.)
That's a useful observation which could help us implement more efficient
journaling than BDB gives us, as you've discussed in the past, but
there's no reason we couldn't do that without project directories.
> The third observation is that the various performance characteristics
> we want can be built on-top of that basic lists-of-changesets
> structure by caching and memoization of data about various revs.
I don't think you can get the theoretical performance curve of
skip-deltas simply by wrapping a cache around a changeset journal.
> But on what should we key those caches, indexes, and memos? The
> project-tree boundaries, because of the tractable size of the trees
> they contain and their relationship to the atomicity of commits, are
> Is that too brief?
If you're suggesting keeping a fulltext cache of every N revisions of
the repository, with hard-links between identical revs of files, then
you're probably not aiming for the same performance characteristics as I
am. With the repository structure we have now, you can have millions of
revs of a file and can get to any of them by combining a double-digit
number of deltas and applying the result to the one plaintext stored in
I also disagree that units of atomicity are always of tractable size.
gcc and Linux and Mozilla all require units of atomicity which are
pretty damn big, assuming you can split them up at all. And if you do
split them up, you'll probably want atomic commits across the units with
> It would have been much wiser, a few years back, to
> implement commits in terms of tree-copies, not fs revision numbers.
I guess the basic idea here is that the repository would only serve the
head revision, you'd commit by copying the trunk (or project) directory
and modifying it, and an update is like a switch. But that's not a
complete vision: how does "svn update" know what to switch to? What URL
would correspond to "the head of the trunk of the Subversion project"
when the path of the head changes with each commit? What restrictions
does the repository enforce to prevent history from disappearing the
space of what clients can access?
Eliminating the revision number by making it effectively part of the
path gains elegance in some areas, but loses it in others. And the only
objective gain I've seen you describe has to do with the theoretical
maximum commit throughput of a repository distributed across many
servers with different servers taking synchronization responsibility for
different parts of the namespace. That's just not a compelling
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Mon Apr 14 05:36:40 2003