[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: result_pool and scratch_pool

From: Julian Foad <julianfoad_at_btopenworld.com>
Date: Tue, 07 Oct 2008 15:48:41 +0100

On Mon, 2008-10-06 at 15:50 -0700, David Glasser wrote:
> +1, this is a good memory management strategy.

Greg, thanks for the write-up. I would summarize the overall case as:

"If an algorithm can pass only a single pool to subroutines, it is
forced to choose either a space penalty (accumulating scratch data along
with results) or a time penalty (by using a temporary sub-pool and
having to copy the results out of it). If it can pass two pools, it can
avoid both penalties."

When I first saw examples of this and when I first read your message, I
couldn't help thinking that passing two pools around everywhere must be
too ugly to be right, and maybe there would be a neat way to encapsulate
the two pool arguments in a single argument or macro. (I expect many
functions will only pass their pool arguments along in a trivial way.)

But, the argument for doing this makes sense overall [1], and sticking
two pool arguments on each call doesn't seem any less neat than some of
the other things we do in the name of structured programming.

Of course, I realise now that only functions that return non-trivial
results need to take a result pool, so we won't be passing two pools to
every function.

Note that the distinction between "result pool" and "scratch pool" is
not quite as hard-cut as it appears. Where a function returns more than
one result, which is not uncommon, the caller may want to keep one
result persistently and use another only temporarily. However, extending
the paradigm to be "one result pool per result" would surely be
over-kill.

Though it feels like this is adding complexity, I suppose it is actually
only making the existing complexity more visible, which should be a good
thing.

So, cautiously, +1 from me too.

- Julian

[1] I could take issue with some details of the arguments, but don't
need to as the overall point is valid.

> (The "don't keep a pool in a baton" thing is already in HACKING;

FWIW, neither Greg nor HACKING says that. Greg says something about
"objects, as opposed to batons" and HACKING says something about a
particular per-object memory management technique.

> "use
> two pools for new code or code you can adjust without compatibility
> issues" could definitely go there too.)
>
> --dave
>
> On Mon, Oct 6, 2008 at 12:02 AM, Greg Stein <gstein_at_gmail.com> wrote:
> > Hey all,
> >
> > On IRC, danielsh asked where I got the convention for
> > result_pool/scratch_pool that I've been using in wc_db.h and when I've
> > been adding more interfaces elsewhere. I wasn't on IRC at the time to
> > answer, and figured the mailing list would be a good place to do that
> > so others can read, too.
> >
> > Way, WAY back when, I recall an FS function taking two pool arguments.
> > One for results, and one for temporary stuff. I don't remember why or
> > which function, but I think that sparked the idea. Over time, there
> > have been situations where it has become obvious that two conceptual
> > pools should be presented to a function: one for the results because
> > they will need to stick around beyond the lifetime of the function
> > call, and a second for any scratch/temporary work that the function
> > needs to perform. This is *especially* noticeable if the function does
> > a LOT of temporary allocations (i.e. calling into subsystems).
> >
> > You can find the latter in our codebase: a function will allocate a
> > subpool, do its work, place results in the provided pool, then destroy
> > the subpool on exit. This is bad.
> >
> > Why is it bad? The caller knows *much* more about what is going on
> > that the function itself does. The caller may be in an iterative loop
> > and already has a subpool for scratch allocations, so creating a new
> > pool is just stupid overhead. Or the caller knows that it is "done",
> > so any allocation can just go into a specific pool because there is no
> > need for destroying a pool to "make room" for something else --
> > everything will be destroyed in just a moment. The caller may be
> > provided a scratch pool and pass that into a sub-call because sharing
> > the same scratch pool keeps the same "high water" mark on memory
> > allocation(*).
> >
> > In short, whenever you see a pool creation for something *other* than
> > an iterative pool, then it is bad. Some function signature(s)
> > somewhere need to be adjusted.
> >
> > Along different lines, if an "object" stores a pool into itself for
> > later use (we're talking an object, not a baton which is just a
> > manifestation of control flow), then it is most probably also bad. Any
> > allocations should be under the control of callers. In general, the
> > caller should provide a result_pool with the same/longer lifetime as
> > the object, rather than allow the object to hold a pool for future
> > allocations.
> >
> > Note that these patterns of pool use were essentially found through
> > the development of Subversion. The Apache httpd server never developed
> > a rigorous pattern of pool use because it pretty much uses just a few
> > pools during the processing of any given request. In a few key areas,
> > an iteration pool is used, but nobody recognized that as a formal
> > pattern of pool use. Within Subversion, we "discovered" that proper
> > use of pools across a complex set of libraries and objects was a
> > requirement, and developed these patterns to meet that need. When I
> > started the wc_db.h interface, I used the result_pool/scratch_pool
> > convention with an intent to add that to our standard pattern
> > throughout our system. It is a convention that improves on our use of
> > pools. If nobody has any object to the pattern, then i'd like to
> > document it in hacking.html under pool use and also mention the
> > convention for function signatures.
> >
> > Thoughts?
> >
> > Cheers,
> > -g
> >
> > (*) there is one case where these self-allocated pools keep the
> > high-water mark low: if you call two functions in series, each using a
> > self-pool, then the high water mark is the max of each function,
> > rather than the sum. avoiding the sum would be important if a function
> > knows it uses a crazy amount of memory. I would maintain the need for
> > a self-pool is rare, and we just state the convention is to avoid, but
> > document/comment on a case-by-case basis when it is needed/used.
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe_at_subversion.tigris.org
For additional commands, e-mail: dev-help_at_subversion.tigris.org
Received on 2008-10-07 16:49:06 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.