[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH]: Increase size of FSFS dir cache

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: 2005-11-07 05:09:09 CET

On Sun, 2005-11-06 at 19:51 -0800, Greg Stein wrote:
> On Sun, Nov 06, 2005 at 10:15:24PM -0500, Daniel Berlin wrote:
> >...
> > Would that satisfy you?
> I wasn't ever *un*satisfied. I was responding to the notion of "trade
> off memory for speed" where nobody appeared to consider concurrency on
> the server which can cause a little bump to become overly large.
> In this concrete case, yeah: not a big deal. As you point out, a lot
> of "ifs". But that 12 meg number did stick out :-)
> >...
> > I believe we have more scalability issues you need to worry about with
> > serving 100 concurrent sessions than the size of the dirent cache.
> > Honestly, if you want to play the "oh, we need to serve 100 concurrent
> > sessions at once" game, here's some real data:
> Fair enough :-) ... and yeah, I think it is a concern, when you look
> at orgs like the ASF and KDE which have big SVN repositories.
> > oprofiling on the server with 20 concurrent sessions going shows that
> > most of the server time is being spent:
> >
> > 1. md5'ing every single darn thing read from fsfs, every single time
> > it's read. I've since hardcoded this off in our fsfs, instead having it
> > do it in svnadmin verify. This was eating about 40% of the CPU time
> > spent per session on the server
> Interesting. With a good policy of running verify, this is a decent
> tradeoff. Maybe for the admins who know they're doing a good job of
> verify, they could disable the checksums? It's a bit worrisome,
> though, because of all the potential failure modes that an admin could
> find themselves in (they aren't as smart as they thought, cron broke,
> forgot to verify that new repos, etc).

At least on gcc.gnu.org, we have rdiff backups for these directories
going back a while.

Plus offsite backups, which verify that no prior revision file ever
changes (IE that rsync doesn't transfer anything for them). :)

This actually does somewhat better than the checksums, because the
checksums only work when you read the revisions, so you'd need to be
vigilant about svnadmin verify anyway.

> > 2. svn_stream_readline to read the hash tables for directories and
> > revprops, again and again and again and again (still, because every
> > svnserve instance it). On gcc.gnu.org, this is actually cpu bound, but
> > still wastes about >20% of the cpu time per session, which is quite
> > high. When it *does* hit the disk, it's ridiculously slow, because it
> > uses a 1 byte buffer.
> Ouch. Time for that APR_UNBUFFERED patch, maybe? Or switch to a binary
> format for these instead? (e.g. len+value)
> > If you want to get the memory footprint down, and get better
> > performance, make it so we don't *have* to cache so much data to get any
> > kind of performance. Personally, i think it's somewhat funny that we
> > are spending most of the time processing revisions *hunting for newline
> > characters* in serialized hash tables
> hehe... agreed :-)
> >...
> > For 1.4, I think we need to do something about the serialized
> > hashtables.
> Sure sounds that way. There would be a positive impact on the client,
> too, but concurrency doesn't really matter as much there :-)

Yeah. In addition, A lot of gcc people want to use svk (because it's
common to have 5-20 branches and trees checked out), and they are
running up against the repo side issues like this, too :)

> >...
> > Any of this will completely destroy any compatibility with the current
> > fsfs format, however.
> > Sadly, i'm not sure there *is* any way to keep compatibility while still
> > giving it the information to avoid svn_stream_readline.
> >
> > (This was another reason i'm in favor of featurizing the fs. We'd just
> > set "hash3" so that we knew how the hashtables should be stored)
> Yup, it would break it, but I don't think that you can "featurize". An
> older SVN wouldn't know how to deal with the new/changed features, so
> it would be just as broken as if you rev'd the whole repository.

Yes, but once featurized, future svn's could keep that format without
too much trouble until you turn it off and enable "hash4" or "hash5".

It's more or less a way of specifying what has bumped, instead of just
"format" :)

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Mon Nov 7 05:09:55 2005

This is an archived mail posted to the Subversion Dev mailing list.