[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH]: Increase size of FSFS dir cache

From: Greg Stein <gstein_at_lyra.org>
Date: 2005-11-07 04:51:08 CET

On Sun, Nov 06, 2005 at 10:15:24PM -0500, Daniel Berlin wrote:
>...
> Would that satisfy you?

I wasn't ever *un*satisfied. I was responding to the notion of "trade
off memory for speed" where nobody appeared to consider concurrency on
the server which can cause a little bump to become overly large.

In this concrete case, yeah: not a big deal. As you point out, a lot
of "ifs". But that 12 meg number did stick out :-)

>...
> I believe we have more scalability issues you need to worry about with
> serving 100 concurrent sessions than the size of the dirent cache.
> Honestly, if you want to play the "oh, we need to serve 100 concurrent
> sessions at once" game, here's some real data:

Fair enough :-) ... and yeah, I think it is a concern, when you look
at orgs like the ASF and KDE which have big SVN repositories.

> oprofiling on the server with 20 concurrent sessions going shows that
> most of the server time is being spent:
>
> 1. md5'ing every single darn thing read from fsfs, every single time
> it's read. I've since hardcoded this off in our fsfs, instead having it
> do it in svnadmin verify. This was eating about 40% of the CPU time
> spent per session on the server

Interesting. With a good policy of running verify, this is a decent
tradeoff. Maybe for the admins who know they're doing a good job of
verify, they could disable the checksums? It's a bit worrisome,
though, because of all the potential failure modes that an admin could
find themselves in (they aren't as smart as they thought, cron broke,
forgot to verify that new repos, etc).

> 2. svn_stream_readline to read the hash tables for directories and
> revprops, again and again and again and again (still, because every
> svnserve instance it). On gcc.gnu.org, this is actually cpu bound, but
> still wastes about >20% of the cpu time per session, which is quite
> high. When it *does* hit the disk, it's ridiculously slow, because it
> uses a 1 byte buffer.

Ouch. Time for that APR_UNBUFFERED patch, maybe? Or switch to a binary
format for these instead? (e.g. len+value)

> If you want to get the memory footprint down, and get better
> performance, make it so we don't *have* to cache so much data to get any
> kind of performance. Personally, i think it's somewhat funny that we
> are spending most of the time processing revisions *hunting for newline
> characters* in serialized hash tables

hehe... agreed :-)

>...
> For 1.4, I think we need to do something about the serialized
> hashtables.

Sure sounds that way. There would be a positive impact on the client,
too, but concurrency doesn't really matter as much there :-)

>...
> Any of this will completely destroy any compatibility with the current
> fsfs format, however.
> Sadly, i'm not sure there *is* any way to keep compatibility while still
> giving it the information to avoid svn_stream_readline.
>
> (This was another reason i'm in favor of featurizing the fs. We'd just
> set "hash3" so that we knew how the hashtables should be stored)

Yup, it would break it, but I don't think that you can "featurize". An
older SVN wouldn't know how to deal with the new/changed features, so
it would be just as broken as if you rev'd the whole repository.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 7 04:48:49 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.