File descriptor leak of rep-cache.db in 1.6.x
From: Dan Villiom Podlaski Christiansen <danchr_at_gmail.com>
Date: Thu, 3 Jun 2010 16:11:47 +0200
I'm experiencing a file descriptor leak in the Subversion 1.6.x branch. I'm hit by it in the 1.6.1 build included with Mac OS X 10.6, a 1.6.11 build from MacPorts and a build of the 1.6.x branch. The bug is not present in 1.5.7, nor in trunk.
The bug is somewhat subtle, and the circumstances causing it are fairly complex. In the cause of running a test suite, we open repositories repeatedly using the ‘file’ protocol, log their history and fetch the contents of all revisions. This will fail after about a hundred tests, having exhausted file descriptors. Inspecting the output of ‘lsof’ on the process, there are 216 open references to ‘rep-cache.db’ files.
As mentioned, the circumstances causing the bug are fairly complex. I see this in hgsubversion, a Mercurial plugin for Subversion interoperability. Both Mercurial and hgsubversion are written in Python, and historically, hgsubversion has used the SWIG bindings for Subversion. Unfortunately, we have found the SWIG bindings to leak like a sieve; it is not uncommon that converting large repositories uses several gigabytes of memory or even exhaust address space in a 32-bit environment. As an effort to fix this, I've been writing a Subvertpy backend for hgsubversion. Subvertpy is a set of alternate Python bindings for Subversion, that expose a much less complicated API, and — most importantly — deals with memory allocation internally, rather than exposing it to the Python environment.
So far, the results are good; converting a test repository (cvs2svn) using the HTTP or svn protocols is slower, but with significantly lower memory use. (Some of that overhead just might be the cost of deallocating more; you never know…) The file protocol, however, appears to leak somewhat, using 27% less memory than when using the SWIG bindings, but use twice as much CPU time. (Please note that I haven't tested this with anything other than 1.6.) The file protocol is the main protocol used by the hgsubversion test suite. Whereas leaking one file descriptor per repository is insignificant during common use, our test suite opens hundreds of repositories in a single process.
Considering the many packages involved, it's not easy to determine which one might be buggy. A few observations:
* While hgsubversion leaks using the SWIG bindings, it doesn't leak file descriptors. This suggests that it isn't the cause of *this* leak.
I haven't been able to reproduce this outside our test suite. Opening a repository directly doesn't cause ‘rep-cache.db’ to opened, nor does obtaining a log of all revisions. hgsubversion has two modes for fetching revisions; replay and diff-based. Hacking the tests to use one instead only affects how many repositories are processed before exhausting descriptors. For reference, I've attached the output of ‘lsof’ on a process running our test suite; both unfiltered and filtered for readability.
So, what to do now? I've discussed this with the Subvertpy author, Jelmer Vernooij, and he's at a loss as to what might cause this other than a bug in Subversion. I'd like to be able to diagnose this further, but so far, I haven't been able to get Subversion to open ‘rep-cache.db’ file. So I ask you guys: Do you think this is a bug in Subversion, or somewhere else? Do you have any hints on what I can do to diagnose this further? If it is a bug in Subversion, could it be fixed in the 1.6.x branch?
(I've Cc'ed this mail to Augie Fackler and Jelmer Vernooij, the maintainers of hgsubversion and Subvertpy, respectively.)
-- Dan Villiom Podlaski Christiansen danchr_at_gmail.com
This is an archived mail posted to the Subversion Dev mailing list.