I have some blue-sky thoughts on this topic. Back in the day, when I
thought about how to optimize directories, I came up with the idea of a
multi-headed B-tree, where each head is a version of the directory.
So if the first version of a directory has a thousand entries, those
would be written out as a B-tree. If you change one entry in the next
version, you wind up having to write out about ten nodes (lg 1000)
including a new head, but you reuse all of the unchanged nodes.
In principle, single-entry lookups are O(lg(n)) and single-entry changes
add O(lg(n)) space. The fiddly design challenges are:
* Keeping the tree reasonably balanced for each revision
* Making efficient use of block reads so that you don't actually have
to seek and read six times to look up a single entry in a 32-file
directory.
Efficiently implementing this design probably requires keeping all of
the data about a directory together, much as a weave-based file storage
design would require keeping all of the data about a file together. BDB
and FSFS do not think in those terms, so this is really just something
to keep in mind if anyone ever designs a more performance-oriented FS
back end. (FSFS is designed to be easily deployable in a variety of
storage environments, not fast or small.)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-02-20 20:50:56 CET