[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Comparison testing: { FSFS, BDB } x { 1.5.4, trunk }

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: Thu, 30 Oct 2008 21:57:06 -0400

Hyrum K. Wright wrote:
> C. Michael Pilato wrote:
>> C. Michael Pilato wrote:
>>> I did a little bit of simple comparison testing between FSFS and BDB in
>>> 1.5.4 and trunk. My testing involved loading a dumpfile of 5000 revisions
>>> (taken from our own repository), then doing some single-revision dumps at a
>>> few time-slices across the loaded repository.
>>> Here are the highlights (percentages are ballpark estimates).
>>> In trunk, FSFS:
>>> is significantly slower (30%) for writes operations. I have no idea why.
>>> is a bit faster for reads (20%).
>>> showed no meaningful disk usage changes. But I'm pretty sure this is
>>> an artifact of the testing dataset, which isn't as up-to-date-branch-heavy
>>> as more recent revision ranges in our source tree are.
>>> In trunk, Berkeley DB:
>>> is significantly faster (50%) for write operations. This is almost
>>> certainly because post-commit deltification is doing a single
>>> deltification instead of touching a chain of files.
>>> is significantly slower (300%) for read operations. Distance to
>>> nearest fulltext?
>>> showed significant improvement in disk usage (20% savings) in trunk.
>>> For the same reasons that FSFS didn't show much improvement here, I
>>> must assume rep-sharing wasn't the real win here. More likely the
>>> minimization of fulltexts (one per line of history) is the win here.
>>> In all things except disk usage (now in trunk), FSFS remains a clear winner
>>> over BDB in this testing.
>>> Attached are the script I used and a spreadsheet with the actual findings.
>> I've got an uncommitted patch which causes Berkeley DB to store *both* MD5
>> and SHA1 checksums, and to be able to cough up the one required by callers.
>> I re-ran the numbers with this patch, and have attached an updated
>> spreadsheet. What I find is that the space and performance cost for
>> calculating and storing both checksums is minimal (atop what the trunk code
>> was already doing). But the read costs dropped by half! I suspect this is
>> because svn_fs_file_md5_checksum() forces a walk over the file contents if
>> the MD5 checksum isn't readily available in the database, which is the case
>> in the current trunk code.
> Where do we still use svn_fs_file_md5_checksum() explicitly? I thought most of
> those calls had been switched to svn_fs_file_checksum(), which can have the same
> behavior you describe, but isn't forced to.

Turns out the calls I was thinking of *are* using svn_fs_file_checksum().
But they also pass TRUE for force. (These are in libsvn_repos/dump.c.) Six
of one, a half-dozen of the other...

C. Michael Pilato <cmpilato_at_collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Received on 2008-10-31 02:57:37 CET

This is an archived mail posted to the Subversion Dev mailing list.