Here are my design ideas for a native-FS-backed filesystem. Feel free
to ask me for elaboration where I've been too concise; I didn't want
to write a book.
I don't think it's possible for one FS design to satisfy everyone.
You can't have a single design with low overhead, good scalability,
zippy speed, simplicity, flexibility with regards to access
permissions and underlying file store, and robustness. So, my goal is
to come up with something that satisfies a different (and still
reasonable) set of priorities from the one we currently have.
Specifically, I want to provide:
* Low overhead. If you have three 10K files in a repository, the
repository shouldn't be much larger than 30K.
* Low administration. No need to worry about running out of locks,
for instance.
* Scalability in the face of high read throughput and moderate write
throughput.
* Flexibility. You should be able to host the repository on a
network filesystem; all you need is the ability to read all the
files in the repository and (for commits) the ability to insert a
new file.
* Simplicity and robustness.
The sacrifices will be:
* It might not be as zippy as the current back end. Because of my
plan to reverse the direction of delta arrows (see below),
checkouts of recent revisions may have to do more than they do in
BDB, although we won't know for sure unless we implement and
measure. We could stick a big plaintext cache in front of the
repository to speed up checkouts, but that adds complexity and
can't be a universal cure.
* It might not scale as well in the face of high write throughput.
Okay, enough about requirements. Here is my current design plan, from
a thousand feet up:
* Finished revisions are represented as a single file.
* Node-revisions are located by a revision number and an offset in
the revision file. (However, the revision file format will be
plain text.)
* All delta arrows point backwards in time, as opposed to our
current ones which point forwards. Thus, a revision file never
needs to be changed once it has been put into place. Skip-deltas
and the delta-combiner ensure that we only need to read O(lg(n))
deltas and combine them together in order to regenerate any given
plaintext.
* Transactions which have yet to be committed are represented as one
file for the changed-file data (appended to as changed files come
in) and a directory tree mirroring the changed part of the
transaction.
* To prepare for a commit, we perform the auto-merge, grab a write
lock on the repository, marshal the changed-directory data onto
the end of the changed-file file, and move the new file into
place. (Alternatively, we could grab the write lock only at the
very end. In this case, if someone else has slid in a commit
while we were busy marshalling the directory data, we'll have to
truncate the changed-file file, re-perform the auto-merge, and
redo the marshalling. That would penalize large commits while
benefitting small ones. It would also be more complicated, though
not tremendously so. Perhaps it could be an option, though it
would be such an arcane one that I wouldn't expect anyone who
needs it to ever find it.)
I have, of course, thought about some alternative designs. Here is my
thinking about two of them:
* We could have one file per node-rev. This would make it easier to
do redeltification, but only at the cost of using lots and lots of
inodes to represent a repository. The overhead would be
particularly brutal for repositories full of many small files.
* We could have one file per node, or per node-copy. That's sort of
CVS-like. This also allows redeltification, but the process of
committing becomes much more complicated, because you have to
atomically replace a whole bunch of different node files. That
means implementing transactions, which is a lot of complexity.
Anyway, comments appreciated. I'm also interested in comments about
how we ought to structure this work so that it doesn't disrupt the
main thrust of Subversion development.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Mar 30 05:52:03 2004