On Sun, Mar 13, 2011 at 7:57 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> On Sun, Mar 13, 2011 at 18:38, <lgo_at_apache.org> wrote:
> > Author: lgo
> > Date: Sun Mar 13 15:38:01 2011
> > New Revision: 1081141
> >
> > URL: http://svn.apache.org/viewvc?rev=1081141&view=rev
> > Log:
> > ra_serf: Drastically limit memory usage on large checkouts. Stop reading
> the update
> > report response when too many fetches/propfinds are already active.
> >
> > * subversion/libsvn_ra_serf/update.c
> > (MAX_OUTSTANDING_REQS): New define.
> > (throttled_handle_xml_parser): Wraps a normal xml parser, stops reading
> when too
> > many active requests.
> > (finish_report): use the new throttled_handle_xml_parser.
> >
> >
> Hi Lieven,
>
> I understand the problem, but on my Windows box your change has three
> problems:
> 1. svn export uses 100% CPU, because serf call
> throttled_handle_xml_parser all the time. Because APR_POLLIN is still
> enabled in pollset.
>
I haven't seen this behavior on my Mac and Linux boxes, but I see where it's
coming from. I assume that returning EAGAIN from handle_response will make
serf 'wait a bit' before it invokes handle_response again, but because of
the APR_POLLIN that doesn't happen.
2. Memory usage didn't change on my tests, svn export of svn-trunk
> still uses 50MB of memory
>
It's only visible when running an export/checkout of a large folder, like
subversion/branches. From our discussion on IRC I see that you've also
noticed the improvements in terms of memory usage this patch brings for
large exports.
3. Time of svn export increased from 50s to 120s.
>
I guess that's because of problem 1, but to be reviewed.
Because of problem 1 I have reverted the change in r1081201.
The cause of the memory usage problem is that ra_serf reads the while update
report response, and then starts preparing requests (update.c:push_state)
for each file to fetch, even of these requests are only to be send much
later. With this approach I made ra_serf stop reading the response when a
certain limit of outstanding fetches/propfinds was reached. This has the
problem of 100% CPU usage you mentioned, and the risk of timing out the
report response connection.
An alternative solution is to keep reading the full report but storing it in
memory or on disc, so that this can serve as a queue when the max. amount of
fetches is reached. I have a work-in-progress patch with an in-memory cache
attached. Note, it's WIP so not clean, aborts on cleanup and still leaks
memory, so just to demonstrate the idea. Disadvantage of the memory cache is
that the part of the report that's read but not yet parsed
is temporarily stored in memory twice.
Thanks for the review.
Lieven
Received on 2011-03-14 08:27:06 CET