[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Making the repository smaller

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: Wed, 20 Feb 2008 14:50:33 -0500

I have some blue-sky thoughts on this topic. Back in the day, when I
thought about how to optimize directories, I came up with the idea of a
multi-headed B-tree, where each head is a version of the directory.

So if the first version of a directory has a thousand entries, those
would be written out as a B-tree. If you change one entry in the next
version, you wind up having to write out about ten nodes (lg 1000)
including a new head, but you reuse all of the unchanged nodes.

In principle, single-entry lookups are O(lg(n)) and single-entry changes
add O(lg(n)) space. The fiddly design challenges are:

  * Keeping the tree reasonably balanced for each revision
  * Making efficient use of block reads so that you don't actually have
to seek and read six times to look up a single entry in a 32-file
directory.

Efficiently implementing this design probably requires keeping all of
the data about a directory together, much as a weave-based file storage
design would require keeping all of the data about a file together. BDB
and FSFS do not think in those terms, so this is really just something
to keep in mind if anyone ever designs a more performance-oriented FS
back end. (FSFS is designed to be easily deployable in a variety of
storage environments, not fast or small.)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-02-20 20:50:56 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.