[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Random serf checkout failures

From: Lieven Govaerts <lgo_at_apache.org>
Date: Wed, 7 Nov 2012 22:28:20 +0100

Hi,

we seem to be raising and discussing a few different issues in this
thread. I've tried to summarize current status with some added
observations.
I propose to start up separate mail threads when discussing one of
these specific issues.

1. When a very low Timeout is set in the apache server configuration,
the server truncates response bodies in some situations:
   1a. I see this happening with responses to GET requests, but it's
not clear yet why this happens, and if we need to worry about it, as a
3 second timeout where the default is 300 secs is really low. Philip
has reported this happening with a REPORT response
(http://svn.haxx.se/dev/archive-2012-11/0097.shtml).
   1b. If the server aborts a connection, ra_serf doesn't interpret
this as an error. This results in many different types of error
messages, like invalid checksums etc. A check has been added to serf
trunk r1678 that will raise an error if the response body is shorter
than expected.

2. ra_serf consumes way too much memory while doing large checkout's
(issue #4196). This is caused (at least partially) by ra_serf not
closing empty directories until the very end of the editor run,
thereby keeping info objects in memory. Mike & Justin have tried to
fix this in r1406366.
In my opinion, the '! dir->ref_count in close_all_dirs' asserts that
Philip keeps seeing (but I can't reproduce) are related to this
specific issue, and have not been fixed by r1406366.

3. Ivan added a potential issue because ra_serf+serf are not handling
events on each open connection regularly. This can result in timeouts,
where the server closes a connection because it thinks the client
isn't responding anymore.
If this happens on one of the connections where the GET/PROPFIND
requests are sent, I don't think this will break the update/checkout:
the server will close the connection and serf will resend the requests
on a new connection. However, if such a timeout occurs on the
connection where ra_serf is reading a large REPORT response, the
response will be truncated and svn will stop with an error.
Two observations:
   3a. When processing connections, serf will read as much data from
one connection as the network allows it to. Only when it can't read
more from the connection, it will go on with the next connection. Ivan
has provided a patch to limit the amount of responses read per
serf_context_run call per connection to 4.
(http://svn.haxx.se/dev/archive-2012-11/0193.shtml)
   3b. The same observation as 3a, but then applied to the connection
where we receive the REPORT response: ra_serf will keep reading as
much data as possible, and create new GET/PROPFIND requests as a
result, without giving serf the time to send these new requests on
other connections.
I don't think this can result in svn stopping, but might result in
stalls as we don't get enough requests out to fetch files.

4. Ivan further observed that the spillbuf mechanism, used to keep
ra_serf reading from the REPORT connection while waiting for slow disc
i/o, has two drawbacks:
    4a. the amount of data stored in the spillbuf can get very large
when processing the REPORT response is paused. When resuming, all data
in the spillbuf is read and processed in one go. As this can be 1MB+
it can take a while, potentially timing out the REPORT response
connection -> svn will stop.
    4b. The GET and PROPFIND requests resulting from parsing that big
chunk of data will all be sent on the same connection, where we have 4
available to send more requests in parallel.

5. In one of Philip error-reporting mails, there was mention of a "408
Request Time-out" response.
(http://svn.haxx.se/dev/archive-2012-11/0076.shtml).
No idea if the server sends this response because one of the above
issues, or if this has a different root caus. What I do know is that
serf does not expect this response as it's not related to an
outstanding request. In that case, serf will return and APR_GENERAL
error.

I think 3+4+5 are relatively easy to solve, but probably difficult to
reproduce to validate the implemented solutions.
1a and 2 require more analysis, I'll try to focus on these two.

Lieven

On Thu, Nov 1, 2012 at 1:29 PM, Philip Martin <philip_at_codematters.co.uk> wrote:
> I've been experimenting checking out subversion/tags from my local
> mirror of the Subversion repository using a client and mod_dav_svn from
> trunk. I'm getting several different errors seemingly at random, I
> suspect they may be related to some common underlying problem.
>
> Sometimes I see a checksum error such as this:
>
> ../src/subversion/svn/checkout-cmd.c:168: (apr_err=200014)
> ../src/subversion/libsvn_client/checkout.c:179: (apr_err=200014)
> ../src/subversion/libsvn_client/update.c:579: (apr_err=200014)
> ../src/subversion/libsvn_client/update.c:440: (apr_err=200014)
> ../src/subversion/libsvn_wc/adm_crawler.c:858: (apr_err=200014)
> ../src/subversion/libsvn_ra_serf/update.c:2515: (apr_err=200014)
> ../src/subversion/libsvn_ra_serf/util.c:2023: (apr_err=200014)
> ../src/subversion/libsvn_ra_serf/util.c:2004: (apr_err=200014)
> ../src/subversion/libsvn_ra_serf/update.c:1065: (apr_err=200014)
> ../src/subversion/libsvn_ra_serf/update.c:895: (apr_err=200014)
> ../src/subversion/libsvn_wc/update_editor.c:3919: (apr_err=200014)
> ../src/subversion/libsvn_subr/checksum.c:469: (apr_err=200014)
> svn: E200014: Checksum mismatch for '/home/pm/sw/subversion/obj/wc/0.25.0/subversion/libsvn_fs/tree.c':
> expected: c38c3e8df15f36a45b8a626e4d82a25d
> actual: 2532722e781ef9afe673c89fd467d5f0
>
> The expected checksum is the checksum in the repository. It's a
> different file each time this error occurs.
>
> Another error is an assert:
>
> Program received signal SIGABRT, Aborted.
> 0x00007ffff6530425 in __GI_raise (sig=<optimised out>)
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0 0x00007ffff6530425 in __GI_raise (sig=<optimised out>)
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x00007ffff6533b8b in __GI_abort () at abort.c:91
> #2 0x00007ffff6d41b78 in svn_error_abort_on_malfunction (can_return=1,
> file=0x7ffff5a97058 "../src/subversion/libsvn_ra_serf/update.c", line=673,
> expr=0x7ffff5a970fe "! dir->ref_count")
> at ../src/subversion/libsvn_subr/error.c:678
> #3 0x00007ffff6d41bcc in svn_error__malfunction (can_return=1,
> file=0x7ffff5a97058 "../src/subversion/libsvn_ra_serf/update.c", line=673,
> expr=0x7ffff5a970fe "! dir->ref_count")
> at ../src/subversion/libsvn_subr/error.c:703
> #4 0x00007ffff5a85d50 in close_all_dirs (dir=0x7fffe885f0c8)
> at ../src/subversion/libsvn_ra_serf/update.c:673
> #5 0x00007ffff5a85cda in close_all_dirs (dir=0x7fffe88d00c8)
> at ../src/subversion/libsvn_ra_serf/update.c:669
> #6 0x00007ffff5a85cda in close_all_dirs (dir=0x7fffe89860c8)
> at ../src/subversion/libsvn_ra_serf/update.c:669
> #7 0x00007ffff5a85cda in close_all_dirs (dir=0x7ffff7f2a0c8)
> at ../src/subversion/libsvn_ra_serf/update.c:669
> #8 0x00007ffff5a8b01b in finish_report (report_baton=0x7ffff7fb8060,
> pool=0x7ffff7fb5028) at ../src/subversion/libsvn_ra_serf/update.c:2665
> #9 0x00007ffff76945cc in svn_wc_crawl_revisions5 (wc_ctx=0x7ffff7f670e0,
> local_abspath=0x7ffff7fb5168 "/home/pm/sw/subversion/obj/wc",
> reporter=0x7ffff5ca07c0, report_baton=0x7ffff7fb8060, restore_files=1,
> depth=svn_depth_unknown, honor_depth_exclude=1,
> depth_compatibility_trick=0, use_commit_times=0,
> cancel_func=0x416b1b <svn_cl__check_cancel>, cancel_baton=0x0,
> notify_func=0x41c703 <notify>, notify_baton=0x7ffff7f673d0,
> scratch_pool=0x7ffff7fb5028)
> at ../src/subversion/libsvn_wc/adm_crawler.c:858
>
> This appears to be in the end of the checkout code but the working copy
> is far from complete.
>
> I have also seen this error:
>
> ../src/subversion/svn/checkout-cmd.c:168: (apr_err=175009)
> ../src/subversion/libsvn_client/checkout.c:179: (apr_err=175009)
> ../src/subversion/libsvn_client/update.c:579: (apr_err=175009)
> ../src/subversion/libsvn_client/update.c:440: (apr_err=175009)
> ../src/subversion/libsvn_wc/adm_crawler.c:858: (apr_err=175009)
> ../src/subversion/libsvn_ra_serf/update.c:2515: (apr_err=175009)
> ../src/subversion/libsvn_ra_serf/util.c:2028: (apr_err=175009)
> ../src/subversion/libsvn_ra_serf/util.c:1850: (apr_err=175009)
> svn: E175009: Premature EOF seen from server (http status=200)
> ../src/subversion/libsvn_ra_serf/util.c:1850: (apr_err=70014)
> svn: E070014: End of file found
>
> All of these look like some problem with the data transfer, serf losing
> data or mod_dav_svn not sending it, but I suppose they could also be
> caused by memory corruption. These errors generally occur when there
> are thousands of files in the pristine store and tens of thousands of
> files in the working copy.
>
> I've seen these error while checking out across my LAN and on localhost
> so I don't think it can be network problem. I'm using serf 1.1.x_at_1674
> and subversion/mod_dav_svn from trunk (currently r1404529 but I was
> seeing these errors a couple of weeks ago as well). I'm seeing these
> errors on Debian/stable and Ubuntu/12.04.
>
> --
> Philip
Received on 2012-11-07 22:29:13 CET

This is an archived mail posted to the Subversion Dev mailing list.