I have interesting memory leak data to share with these two lists
(crossposting to both svn and apr dev lists).
Ever since we launched svn-on-bigtable over at Google (about 2 years
ago), we've been struggling with mysterious memory leaks in apache --
very similar to what users are complaining about in Subversion issue
3084.
After lots of analysis, here's what we've figured out so far.
Symptom:
When you have a process that runs for a very long time while making
use of APR pools, the global pool tends to fragment into tiny pieces,
and APR just keeps on malloc()ing without ever calling free(). In
other words, a guaranteed long-and-slow leak.
Most people don't notice this problem with httpd, because they run
httpd in prefork mode: a bunch of httpd processes that only serve 1000
requests, then die and get re-spawned. They never live long enough
to exhibit the leak. But if you run apache in threaded mode, and let the
same apache run for days and weeks, it leaks a *lot*.
Cause:
If you look at APR's pool code, you can see the main reason for
fragmentation. In a nutshell, it never recombines recycled memory.
For example, suppose over an hour I create 20 subpools each 5k in
size, then apr_pool_destroy() them in turn. APR then places these
blocks into a 'free memory' list for future recycling. If I then
create a new subpool that requires 3k, no problem -- APR gives me back
one of the existing 5k blocks to use. But if I create a subpool that
requires 20k, whoops, it just goes and malloc()s 20k from the OS,
rather than combining four adjacent blocks from the 'free' list.
Our solution:
Over at Google, we simply hacked APR to *never* hold on to blocks for
recycling. Essentially, this makes apr_pool_destroy() always free()
the block, and makes apr_pool_create() always call malloc() malloc.
Poof, all the memory leak went away instantly.
What was more troubling is that the use of the MaxMemFree directive --
which is supposed to limit the total size of the 'free memory'
recycling list -- didn't seem to work for us. What we need to do is
go back and debug this more carefully, and see if it's a bug in APR,
apache, or just in our testing methodology.
But I think there's still got to be something wrong with MaxMemFree,
since users are claiming it's not working for them in issue 3084.
Something is fishy. We plan to look into it more, but since users are
screaming, maybe someone else can beat us to it...
In the long term, I think we need to question the utility of having
APR do memory recycling at all. Back in the early 90's, malloc() was
insanely slow and worth avoiding. In 2008, now that we're running
apache with nothing but malloc/free, we're unable measure any
performance hit. The whole pool interface is really nice, but I
wonder if pool recycling may just be unnecessary on modern hardware
and OSes.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-01 20:11:37 CEST