Re: Making the repository smaller

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: Wed, 20 Feb 2008 14:50:33 -0500

I have some blue-sky thoughts on this topic. Back in the day, when I
thought about how to optimize directories, I came up with the idea of a
multi-headed B-tree, where each head is a version of the directory.

So if the first version of a directory has a thousand entries, those
would be written out as a B-tree. If you change one entry in the next
version, you wind up having to write out about ten nodes (lg 1000)
including a new head, but you reuse all of the unchanged nodes.

In principle, single-entry lookups are O(lg(n)) and single-entry changes
add O(lg(n)) space. The fiddly design challenges are:

* Keeping the tree reasonably balanced for each revision
* Making efficient use of block reads so that you don't actually have
to seek and read six times to look up a single entry in a 32-file
directory.

Efficiently implementing this design probably requires keeping all of
the data about a directory together, much as a weave-based file storage
design would require keeping all of the data about a file together. BDB
and FSFS do not think in those terms, so this is really just something
to keep in mind if anyone ever designs a more performance-oriented FS
back end. (FSFS is designed to be easily deployable in a variety of
storage environments, not fast or small.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-02-20 20:50:56 CET

This message: [ Message body ]
Next message: Karl Fogel: "reintegrate and renames"
Previous message: Karl Fogel: "Re: Potential Backports for 1.5"
In reply to: Branko Čibej: "Re: Making the repository smaller"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]