[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Comparison testing: { FSFS, BDB } x { 1.5.4, trunk }

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Thu, 30 Oct 2008 19:16:29 -0400

C. Michael Pilato wrote:
> I did a little bit of simple comparison testing between FSFS and BDB in
> 1.5.4 and trunk. My testing involved loading a dumpfile of 5000 revisions
> (taken from our own repository), then doing some single-revision dumps at a
> few time-slices across the loaded repository.
>
> Here are the highlights (percentages are ballpark estimates).
>
> In trunk, FSFS:
>
> is significantly slower (30%) for writes operations. I have no idea why.
>
> is a bit faster for reads (20%).
>
> showed no meaningful disk usage changes. But I'm pretty sure this is
> an artifact of the testing dataset, which isn't as up-to-date-branch-heavy
> as more recent revision ranges in our source tree are.
>
> In trunk, Berkeley DB:
>
> is significantly faster (50%) for write operations. This is almost
> certainly because post-commit deltification is doing a single
> deltification instead of touching a chain of files.
>
> is significantly slower (300%) for read operations. Distance to
> nearest fulltext?
>
> showed significant improvement in disk usage (20% savings) in trunk.
> For the same reasons that FSFS didn't show much improvement here, I
> must assume rep-sharing wasn't the real win here. More likely the
> minimization of fulltexts (one per line of history) is the win here.
>
> In all things except disk usage (now in trunk), FSFS remains a clear winner
> over BDB in this testing.
>
> Attached are the script I used and a spreadsheet with the actual findings.

I've got an uncommitted patch which causes Berkeley DB to store *both* MD5
and SHA1 checksums, and to be able to cough up the one required by callers.
 I re-ran the numbers with this patch, and have attached an updated
spreadsheet. What I find is that the space and performance cost for
calculating and storing both checksums is minimal (atop what the trunk code
was already doing). But the read costs dropped by half! I suspect this is
because svn_fs_file_md5_checksum() forces a walk over the file contents if
the MD5 checksum isn't readily available in the database, which is the case
in the current trunk code.

-- 
C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2008-10-31 00:16:44 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.