[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

[PATCH]: Increase size of FSFS dir cache

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: 2005-10-30 03:57:41 CET

So i've been doing a bunch of profiling on the svnserve side of gcc, to
try to speed up diffs between branches, and what keeps popping up on
profiles (besides md5 calculation), is doing get_dir_entries (and the
underlying calls to svn_stream_readline, etc).

We use an external diff (GNU diff) client side, so the client profiles
don't show much time in subversion (Note: GNU Diff is significantly
faster than subversion's :P)

It turns out our single-dir directory cache doesn't do so well.

In fact, we miss almost all the time.
Yet statistics show we end up asking for the dirents for same directory
40 or 50 times in some cases, just not immediately again and again. The
obvious way to attack this is to increase the number of dirs cached in
the dirents to turn those into hits.

With the attached patch, the diff time goes from 55 seconds, to 42, and
the server time is a bunch less.

I'm not sure whether 128 dirs is too small or not. For GCC, the memory
increase due to this is completely negligible, but the win, as shown
above, is about 24% in diff time.

1. Anybody who doesn't have enough memory to hold 128 dirs of their repo
in memory is probably in trouble anyway. Assuming 100k of info per dir,
that's only ..... 12.8 meg of memory, *if they hit all the dirs*, and
they had probably about 1000 files per dir (to generate 100k of info).
This seems reasonable to me.

2. The internal rev of the id makes a perfectly fine hash. We just use
it as an index into the table, not as the actual id. We still compare
the ids, of course. The rev was chosen over other possible keys because
the others are strings, and hashing strings is more expensive.

3. The number is certainly magic, but it's not easy to make this user
configurable without adding files to fsfs. I also take the view that
gcc is probably the size of the average "been using subversion for a
couple years to store projects" repository. I don't imagine people will
want the number significantly smaller, however, they may want it bigger.

Maybe we should explore a "config" file to tune these parameters.

(Sorry the diff looks uglier than it should, when you are doing line by
line replacements like this, unidiff tends to look worse than context
diff :( )

[[[

  Increase size of fs_fs cache of per-directory dirents from 1 directory
  worth of dirents to 128 directories worth of dirents.

  * subversion/libsvn_fs_fs/fs_fs.c:
    (svn_fs_fs__rep_contents_dir): Use noderev id to calculate
    an index into the now-larger dirents cache.
    (svn_fs_fs__set_entry): Ditto.
    (svn_fs_fs__abort_txn): Destroy the now-larger dirents cache using
    memset, since it contains multiple entries.

  * subversion/libsvn_fs_fs/fs.h
    (NUM_DIR_CACHE_ENTRIES): New macro, default to 128.
    (struct fs_fs_data_t): Turn dir_cache members into tables of size
    NUM_DIR_CACHE_ENTRIES.

]]]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Received on Sun Oct 30 03:58:31 2005

This is an archived mail posted to the Subversion Dev mailing list.