[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Minimizing repository growth when large files change....

From: Justin Erenkrantz <justin_at_erenkrantz.com>
Date: 2005-01-06 18:55:40 CET

--On Thursday, December 30, 2004 1:32 AM -0600 Ben Collins-Sussman
<sussman@collab.net> wrote:

> This sounds really weird to me. I mean, we're all aware that fsfs uses
> *some* less space than bdb... like 20% less, I thought, was the rule of
> thumb.
>
> But 90% less space? Is something really fishy going on here? If the
> script below really reproduces this, should we investigate?

Well, BDB may not be as efficient. From their docs:

<http://www.sleepycat.com/docs/ref/am_misc/diskspace.html>

"Space freed by deleting key/data pairs from a Btree or Hash database is
never returned to the filesystem, although it is reused where possible.
This means that the Btree and Hash databases are grow-only. If enough keys
are deleted from a database that shrinking the underlying file is
desirable, you should create a new database and copy the records from the
old one into it."

Here's a data point with a certain repository with a dump w/~120k revisions:

BDB on a straight load: 6.3GB
FSFS on a straight load: 3.5GB
BDB after a db_dump/db_load cycle: 4.7GB

So, after a BDB dump/load, yes, it's within ~20% of FSFS. However, I bet
large BDB temporary transactions (such as we do for a commit) causes a
spike in the size and that is never really recouped... (BDB 4.2.52, FWIW.)

HTH. -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Jan 6 18:57:20 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.