svnserve under Linux inetd hangs, burning CPU cycles, under too low TCP-sendbuffer.

From: Dr. Andreas Krüger <andreas.krueger_at_dv-ratio.com>
Date: Wed, 24 Feb 2010 17:20:20 +0100

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I run svnserve under Linux under inetd, with low memory allowances for
TCP send buffers.

Checkout on the client side causes svnserve to burn CPU cycles,
apparently endlessly without making any progress.

Here are the gory details:

My Setup
========

I'm running a Debian Lenny box, a i386 installation,
with inetd out of the openbsd-inetd Debian package 0.20080125-2,
this inetd starts svnserve from Debian package subversion 1.5.1dfsg1-4
(I don't think it matters much:) listening on a non-standard port
(I have another repository on the standard port), through the
inetd.conf line

> 3691 stream tcp nowait svn /usr/bin/svnserve svnserve -i -r
/path-to-repository

The repository is plain fsfs; /path-to-repository/format has a single
"5", followed by a line feed.

I'm the only person who ever uses this particular repository. (It's
mostly for backup and transfering certain files from work to home and
back.) While the problem occurs, there is no other concurrent access
to the repository, besides the failing svn co.

There is virtualization at work here. The entire system is one slice
of an OpenVZ installation. I try to get by with fairly little memory
devoted to this particular slice. Thus far, I've been describing the
guest. The host also runs Debian Lenny, namely, the current
linux-image-2.6.26-2-openvz-686 kernel.

I don't think this matters much, but, for the record (and to make
slightly more interesting reading), there is another level of
virtualization at work here: My OpenVZ host happens to be a Xen guest.
So virtualisation is stacked on top of virtualization. I do not
control and have no access to the Xen host. I'm using Debian's
libc6-xen version 2.7-18lenny2 on both my vz host and my vz slice.

The problem
===========

I'm doing a plain vanilla "svn co" on the client. It starts to show
some checkout activity, adding some files, then stops and seems to
hang. No reaction for as long as I care to wait.

In the meantime, I obtain a shell prompt at the vz server slice, look
around and see: There is this svnserve process that uses what CPU
cycles it can get. I can kill it, then, back on the client, remove
thedirectory with the half-baked checkout, and reproduce the problem
at will. For good luck, I try "svnadmin verify" on the server. Ooops -
shouldn't have done that as root. After fixing the permissions, the
problem reappears. Killing the svn process on the client seems to need
"kill -9" and does not help, the CPU waste on the server continues.

Looking at /proc/user_beancounters on the server, I see failure counts
at both tcpsndbuf and (probably unrelated) at tcprcvbuf. The failure
count on tcpsndbuf increases each time I exercise the problem, not by
1, but by 3. In this particular situation, the barrier on tcpsndbuf
is set to 319488 and the limit to 524288, the maxheld seen is 320320.

Becoming curious, I take an strace of the svnserve. I see a more or
less endless loop:

First, svnserve keeps reopening

/path-to-repository/db/revs/0/3

For the record: That entire repository boasts 31 revisions, but /0/3,
with a 8162358 byte file, seems to be the largest. Most revisions add
new files. There are relatively few changes of existing material in
this particular repository. Revision 3 was adding a whole lot of new
files, with no other changes. The largest individual file added in
that revision was 807993 byte long.

This revision file .../db/revs/0/3 gets opened and closed several
times (weird in itself). I see some seek and read activity of that
file, and also some writes to file descriptor 1. All writes to file
descriptor 1 write the full number of bytes intended (up to the last
one, which was interrupted by me killing the process). I also see
several poll timeouts

poll([{fd=0, events=POLLIN}], 1, 0) = 0 (Timeout)

and several successful brk calls, before it finally all starts over again.

More or less over again, that is. There seems to be an _llseek into
that revision file that seeks differently each time. The precise
numbers do not grow, they fluctuate.

I finally increase this vz slice's tcpsndbuf allowance to
614400:921600 and, voila, that solves the problem.

Removing the client side partial checkout directory one more time and
starting over again, the checkout goes through this time. For the
record: Afterwards, on the server, the tcpsndbuf shows a new maxheld
of 416640.

I set the tcpsndbuf barrier and limit back to the old, smaller values
and replace openbsd-inetd with rlinetd and then with inetutils-inetd.
With these inetd implementations, I can also reproduce the CPU waste
behavior of svnserve. So it doesn't seem to be a bug in the particular
inetd implementation.

What's next?
============

I'd be happy to open a bug over at

http://subversion.tigris.org/issue-tracker.html

it that's what would help solve this issue.

I could of course also reproduce the problem and investigate further,
if that'd help.

Regards, and thank you all for providing fine software,

Andreas
- --

Dr. Andreas Krüger, Berater, DV-RATIO NORDWEST GmbH
andreas.krueger_at_dv-ratio.com
GPG/PGP Fingerprint 8063 4A9B 362D 4220 A546 14C1 EA19 AADC FD44 5EB7

DV-RATIO NORDWEST GmbH
Tel: +49 (0)211 / 577 996-0
Fax: +49 (0)211 / 577 996-26
http://www.dv-ratio.com <http://www.dv-ratio.com>
Sitz der Gesellschaft Habsburgerstraße 12, 40547 Düsseldorf
Registergericht Düsseldorf HRB 34330
USt-IdNr.: DE811321837
Steuer-Nr.: 809/44031
Geschäftsführung: Günter Gerstmann
Prokura: Trudbert Vetter, Uwe Wolfram

DV-RATIO - "Kompetenz und Zuverlässigkeit seit 1980"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkuFUbgACgkQ6hmq3P1EXrccagCgmlvhg8av4eFEjmx0EtzSYtbZ
+iEAnj46GlmRvAK3W3IXgx9/2/J/Neu6
=ix+a
-----END PGP SIGNATURE-----
Received on 2010-02-24 17:20:59 CET

This message: [ Message body ]
Next message: Dr. Andreas Krüger: "Re: svnserve under Linux inetd hangs, burning CPU cycles, under too low TCP-sendbuffer."
Previous message: Bert Huijben: "RE: WC-NG presence 'incomplete'"
Next in thread: Dr. Andreas Krüger: "Re: svnserve under Linux inetd hangs, burning CPU cycles, under too low TCP-sendbuffer."
Reply: Dr. Andreas Krüger: "Re: svnserve under Linux inetd hangs, burning CPU cycles, under too low TCP-sendbuffer."
Reply: Peter Samuelson: "Re: svnserve under Linux inetd hangs, burning CPU cycles, under too low TCP-sendbuffer."

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]