[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Subversion's use of Berkeley DB [#11511]

From: Keith Bostic <bostic_at_abyssinian.sleepycat.com>
Date: 2004-12-08 21:21:49 CET

> From: Justin Erenkrantz <justin@erenkrantz.com>
>
> --On Wednesday, December 8, 2004 12:38 PM -0500 Keith Bostic
> <bostic@abyssinian.sleepycat.com> wrote:
>
>> 1. The Subversion code is not setting the Berkeley DB cache size.
>> Given Berkeley DB's small default cache size (256KB), and the
>> expected good locality of reference for Subversion queries,
>> I think Subversion will be able to increase performance by
>> setting the cache size.
>>
>> You can set the cache in the DB_CONFIG file, or by using
>> the DbEnv::set_cachesize method:
>>
>> http://www.sleepycat.com/docs/api_c/env_set_cachesize.html
>>
>> For more information, see the "Selecting a cache size"
>> section of the Berkeley DB Reference Guide, included in your
>> download package and also available at:
>>
>> http://www.sleepycat.com/docs/ref/am_conf/cachesize.html
>>
>> Action Items:
>> Investigate the efficiency of the current Subversion cache
>> (using the Berkeley DB db_stat utility), and see if there's
>> benefit to be had by increasing the cache size.
>>
>> Change Subversion to specify a cache size whenever creating
>> a Berkeley DB database environment.
>
> The question I have is what's an appropriate cache size? 1M?
> 2M? 8M? 128M?

The appropriate size will depend on the database environment:
how big are your databases, what are the data access patterns,
what does the data look like. There are many different things
to consider when looking at cache tuning.

Regardless, 256KB is too small for any server application. If
you make it even 1MB, things will likely be happier.

> Can we change the cache size by just tweaking DB_CONFIG and restarting the
> processes? Or, do we need to rebuild the database? (The docs I can find
> aren't very helpful on this.)

You can change the cache size by tweaking DB_CONFIG and
restarting the processes.

> If it helps, here's the db_stat -m output from our (svn.apache.org) install
> using BDB 4.2:
>
> <http://www.apache.org/~jerenkrantz/bdb-db-stat.txt>

We can improve on this. Overall, you're getting 87% hit rate
(which I find pretty amazing given you're using a tiny cache).

Some comments:

        259KB 476B Total cache size.

This is a 256KB cache (plus 25%, which Berkeley DB adds on for
small caches to cover its overhead).

        3579M Requested pages found in the cache (87%).

Overall effectiveness of the cache.

        =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
        Pool File: uuids
        16384 Page size.
        0 Requested pages mapped into the process' address space.
        7598 Requested pages found in the cache (2%).

Yikes! Subversion is doing I/O on almost every request from
this database. On the other hand, there aren't many requests.

        =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
        Pool File: changes
        16384 Page size.
        0 Requested pages mapped into the process' address space.
        80M Requested pages found in the cache (89%).

This database is doing pretty well, given the cache size it's
exhibiting strong locality of reference.

        =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
        Pool File: copies
        16384 Page size.
        0 Requested pages mapped into the process' address space.
        3039315 Requested pages found in the cache (24%).

Not so good. 3 million page requests, and we're doing I/O for
75% of them.

        =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
        Pool File: revisions
        16384 Page size.
        0 Requested pages mapped into the process' address space.
        48M Requested pages found in the cache (67%).

Another not-so-good one. 48 million page requests, and we're
doing I/O for 25% of them.

        =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
        Pool File: strings
        16384 Page size.
        0 Requested pages mapped into the process' address space.
        1128M Requested pages found in the cache (92%).

Good locality of reference here -- 90% plus on a billion page
requests.

        =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
        Pool File: representations
        16384 Page size.
        0 Requested pages mapped into the process' address space.
        830M Requested pages found in the cache (92%).

Ditto.

        =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
        Pool File: transactions
        16384 Page size.
        0 Requested pages mapped into the process' address space.
        85M Requested pages found in the cache (67%).

Not so good.

        =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
        Pool File: nodes
        16384 Page size.
        0 Requested pages mapped into the process' address space.
        330M Requested pages found in the cache (71%).

Not so good.

Given the numbers you're already seeing, I think you might see
a significant improvement for a modest investment of additional
cache.

There are additional information in the "Selecting a cache size"
section of the Berkeley DB Reference Guide, included in your
download package and also available at:

        http://www.sleepycat.com/docs/ref/am_conf/cachesize.html

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic bostic@sleepycat.com
Sleepycat Software Inc. keithbosticim (ymsgid)
118 Tower Rd. +1-781-259-3139
Lincoln, MA 01773 http://www.sleepycat.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Dec 8 21:23:10 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.