[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Comparison testing: { FSFS, BDB } x { 1.5.4, trunk }

From: Hyrum K. Wright <hyrum_wright_at_mail.utexas.edu>
Date: Thu, 30 Oct 2008 20:53:58 -0500

C. Michael Pilato wrote:
> C. Michael Pilato wrote:
>> I did a little bit of simple comparison testing between FSFS and BDB in
>> 1.5.4 and trunk. My testing involved loading a dumpfile of 5000 revisions
>> (taken from our own repository), then doing some single-revision dumps at a
>> few time-slices across the loaded repository.
>>
>> Here are the highlights (percentages are ballpark estimates).
>>
>> In trunk, FSFS:
>>
>> is significantly slower (30%) for writes operations. I have no idea why.
>>
>> is a bit faster for reads (20%).
>>
>> showed no meaningful disk usage changes. But I'm pretty sure this is
>> an artifact of the testing dataset, which isn't as up-to-date-branch-heavy
>> as more recent revision ranges in our source tree are.
>>
>> In trunk, Berkeley DB:
>>
>> is significantly faster (50%) for write operations. This is almost
>> certainly because post-commit deltification is doing a single
>> deltification instead of touching a chain of files.
>>
>> is significantly slower (300%) for read operations. Distance to
>> nearest fulltext?
>>
>> showed significant improvement in disk usage (20% savings) in trunk.
>> For the same reasons that FSFS didn't show much improvement here, I
>> must assume rep-sharing wasn't the real win here. More likely the
>> minimization of fulltexts (one per line of history) is the win here.
>>
>> In all things except disk usage (now in trunk), FSFS remains a clear winner
>> over BDB in this testing.
>>
>> Attached are the script I used and a spreadsheet with the actual findings.
>
> I've got an uncommitted patch which causes Berkeley DB to store *both* MD5
> and SHA1 checksums, and to be able to cough up the one required by callers.
> I re-ran the numbers with this patch, and have attached an updated
> spreadsheet. What I find is that the space and performance cost for
> calculating and storing both checksums is minimal (atop what the trunk code
> was already doing). But the read costs dropped by half! I suspect this is
> because svn_fs_file_md5_checksum() forces a walk over the file contents if
> the MD5 checksum isn't readily available in the database, which is the case
> in the current trunk code.

Where do we still use svn_fs_file_md5_checksum() explicitly? I thought most of
those calls had been switched to svn_fs_file_checksum(), which can have the same
behavior you describe, but isn't forced to.

-Hyrum

Received on 2008-10-31 02:54:18 CET

This is an archived mail posted to the Subversion Dev mailing list.