We (as the Apache Subversion developers) have discovered that mod_dav can
trigger an unbounded memory usage when used in conjunction with a module that
inserts an output filter — such as mod_headers or mod_deflate. Below is the
configuration that can be used to reproduce the issue:
MaxMemFree 128
Header always append Foo Bar
<Location />
DAV svn
SVNParentPath C:/Repositories
</Location>
With this configuration, responding to a GET request for a 500 MB file results
in the server consuming a large amount (~3.2 GB) of memory. The memory
footprint is caused by heap allocations. Serving larger files could quickly
exhaust all available memory for the machine.
I would be glad to provide a patch for this issue, but I think it's certainly
worth discussing the solution beforehand.
The problem is caused by how mod_dav passes the output filter list to its
providers. Please see the deliver() hook definition in mod_dav.h:1948 and
its usage in mod_dav.c:888:
/* Repository provider hooks */
struct dav_hooks_repository
{
...
dav_error * (*deliver)(const dav_resource *resource,
ap_filter_t *output);
The hook receives the current head of the output filter list for a particular
request. Certain filters, such as the one in mod_headers, are designed
to perform the work only once. When the work is done, a filter removes
itself from the list. If a filter is the first in the list, this updates the
head of the linked list in request_rec (r->output_filters).
So, after a particular DAV provider calls ap_pass_brigade(), the actual head
of the filter list may now be different from what was passed via the argument.
But the hook is unaware of that, since the previous head of the linked list
was passed via `output` argument. Subsequent calls to the ap_pass_brigade()
are going to invoke these (removed) filters again, and this is what happens
in the example. The mod_headers filter is called per every ap_pass_brigade()
call, it sets the headers multiple times, their values are allocated in the
request pool, and this causes unbounded memory usage. Apart from the memory
consumption, it could also cause various issues due to the filter being
unexpectedly called several times.
One way of solving the problem that I can think of is:
1) Introduce dav_hooks_repository2 with the deliver(), deliver_report() and
merge() hooks accepting a `request_rec` instead of an `ap_filter_t`
A DAV provider would be expected to use r->output_filters, and that's
going to handle the case when the head of the filter list is updated.
New structure is required to keep compatibility, as dav_hooks_repository
is a part of the mod_dav's public API. This would require a couple of
compatibility shims for the new API (and the old versions would become
deprecated).
2) Implement a workaround for existing DAV providers
Do that by inserting a special output filter to the beginning of the list
before calling the particular DAV provider hooks. The filter would be a
no-op, but it would never remove itself from the list, thus guaranteeing
that the head of the filter list doesn't change during the execution of
the hook.
The downside of this approach is that it doesn't guard against other mistakes
of the same kind. It's easy to miss that a head of the filter list can change
between invocations, and the consequences are fairly severe. For instance,
r1748047 [1] adds another public API, dav_send_one_response(), that receives
an `ap_filter_t`, that might change between calls to ap_pass_brigade(), so it
has the same kind of problem.
Maybe this can solved by introducing something like an `ap_filter_chain_t`
that keeps track of the actual head of the list, and starting to use it where
a function needs to push data to the filter stack?
I am ready to prepare a patch for this issue, but perhaps there's a completely
different and a better way to solve the problem? What do you think?
[1] https://svn.apache.org/r1748047
Regards,
Evgeny Kotkov
Received on 2016-08-19 15:44:50 CEST