[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Comparison testing: { FSFS, BDB } x { 1.5.4, trunk }

From: David Glasser <glasser_at_davidglasser.net>
Date: Fri, 31 Oct 2008 18:34:25 -0700

On Thu, Oct 30, 2008 at 6:59 PM, Hyrum K. Wright
<hyrum_wright_at_mail.utexas.edu> wrote:
> C. Michael Pilato wrote:
>> Hyrum K. Wright wrote:
>>> C. Michael Pilato wrote:
>>>> C. Michael Pilato wrote:
>>>>> I did a little bit of simple comparison testing between FSFS and BDB in
>>>>> 1.5.4 and trunk. My testing involved loading a dumpfile of 5000 revisions
>>>>> (taken from our own repository), then doing some single-revision dumps at a
>>>>> few time-slices across the loaded repository.
>>>>>
>>>>> Here are the highlights (percentages are ballpark estimates).
>>>>>
>>>>> In trunk, FSFS:
>>>>>
>>>>> is significantly slower (30%) for writes operations. I have no idea why.
>>>>>
>>>>> is a bit faster for reads (20%).
>>>>>
>>>>> showed no meaningful disk usage changes. But I'm pretty sure this is
>>>>> an artifact of the testing dataset, which isn't as up-to-date-branch-heavy
>>>>> as more recent revision ranges in our source tree are.
>>>>>
>>>>> In trunk, Berkeley DB:
>>>>>
>>>>> is significantly faster (50%) for write operations. This is almost
>>>>> certainly because post-commit deltification is doing a single
>>>>> deltification instead of touching a chain of files.
>>>>>
>>>>> is significantly slower (300%) for read operations. Distance to
>>>>> nearest fulltext?
>>>>>
>>>>> showed significant improvement in disk usage (20% savings) in trunk.
>>>>> For the same reasons that FSFS didn't show much improvement here, I
>>>>> must assume rep-sharing wasn't the real win here. More likely the
>>>>> minimization of fulltexts (one per line of history) is the win here.
>>>>>
>>>>> In all things except disk usage (now in trunk), FSFS remains a clear winner
>>>>> over BDB in this testing.
>>>>>
>>>>> Attached are the script I used and a spreadsheet with the actual findings.
>>>> I've got an uncommitted patch which causes Berkeley DB to store *both* MD5
>>>> and SHA1 checksums, and to be able to cough up the one required by callers.
>>>> I re-ran the numbers with this patch, and have attached an updated
>>>> spreadsheet. What I find is that the space and performance cost for
>>>> calculating and storing both checksums is minimal (atop what the trunk code
>>>> was already doing). But the read costs dropped by half! I suspect this is
>>>> because svn_fs_file_md5_checksum() forces a walk over the file contents if
>>>> the MD5 checksum isn't readily available in the database, which is the case
>>>> in the current trunk code.
>>> Where do we still use svn_fs_file_md5_checksum() explicitly? I thought most of
>>> those calls had been switched to svn_fs_file_checksum(), which can have the same
>>> behavior you describe, but isn't forced to.
>>
>> Turns out the calls I was thinking of *are* using svn_fs_file_checksum().
>> But they also pass TRUE for force. (These are in libsvn_repos/dump.c.) Six
>> of one, a half-dozen of the other...
>
> That's what I figured. I *think* that's the only place we force a checksum
> calculation, and that's because we want the md5 to be there for older clients if
> somebody's doing a dump-load from 1.6 to pre-1.6. Otherwise, we could just put
> whatever checksum we had, sha1 or md5, and then let the loader put the same kind
> of checksum in the target repo. That would also save a few of our "ignore this
> checksum 'cause it ain't the right kind" conditionals.

So I'm confused. The editor still only talks in md5s, right? Does
that mean that our current "migrate stuff from md5 to sha1" strategy
is (as long as the editor is unrevved) means that we're dropping most
md5s in updates/commits?

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-11-01 02:35:05 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.