[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

File descriptor leak of rep-cache.db in 1.6.x

From: Dan Villiom Podlaski Christiansen <danchr_at_gmail.com>
Date: Thu, 3 Jun 2010 16:11:47 +0200

Hi,

I'm experiencing a file descriptor leak in the Subversion 1.6.x branch. I'm hit by it in the 1.6.1 build included with Mac OS X 10.6, a 1.6.11 build from MacPorts and a build of the 1.6.x branch. The bug is not present in 1.5.7, nor in trunk.

The bug is somewhat subtle, and the circumstances causing it are fairly complex. In the cause of running a test suite, we open repositories repeatedly using the ‘file’ protocol, log their history and fetch the contents of all revisions. This will fail after about a hundred tests, having exhausted file descriptors. Inspecting the output of ‘lsof’ on the process, there are 216 open references to ‘rep-cache.db’ files.

As mentioned, the circumstances causing the bug are fairly complex. I see this in hgsubversion,[1] a Mercurial plugin for Subversion interoperability. Both Mercurial and hgsubversion are written in Python, and historically, hgsubversion has used the SWIG bindings for Subversion. Unfortunately, we have found the SWIG bindings to leak like a sieve; it is not uncommon that converting large repositories uses several gigabytes of memory or even exhaust address space in a 32-bit environment. As an effort to fix this, I've been writing a Subvertpy backend for hgsubversion. Subvertpy[2] is a set of alternate Python bindings for Subversion, that expose a much less complicated API, and — most importantly — deals with memory allocation internally, rather than exposing it to the Python environment.

So far, the results are good; converting a test repository (cvs2svn) using the HTTP or svn protocols is slower, but with significantly lower memory use. (Some of that overhead just might be the cost of deallocating more; you never know…) The file protocol, however, appears to leak somewhat, using 27% less memory than when using the SWIG bindings, but use twice as much CPU time. (Please note that I haven't tested this with anything other than 1.6.) The file protocol is the main protocol used by the hgsubversion test suite. Whereas leaking one file descriptor per repository is insignificant during common use, our test suite opens hundreds of repositories in a single process.

Considering the many packages involved, it's not easy to determine which one might be buggy. A few observations:

* While hgsubversion leaks using the SWIG bindings, it doesn't leak file descriptors. This suggests that it isn't the cause of *this* leak.
* It is quite possible that the source of the leak is in Subvertpy. In order to get hgsubversion working using it, I had to add a few missing wrapper APIs. However…
* Neither hgsubversion nor Subvertpy contain any logic related to the various repository access methods. It seems likely that if either were the cause of the bug, it should affect all access methods and not just one.
* From a brief inspection of the Subversion source code, it appears that the ‘rep-cache.db’ is an implementation detail deep in Subversion. It's odd that this file remains open throughout the lifetime of the process. If the source of the leak were higher up in the chain, wouldn't other files in the repository remain open as well?
* Finally, there's the point that the leak isn't present when using Subversion 1.5.7 or 1.7.x. Subvertpy uses slightly different code paths for 1.5.7, but for 1.7.x, the code used is exactly the same. hgsubversion requires Subversion 1.5, and uses the same paths regardless of the underlying version of Subversion.

I haven't been able to reproduce this outside our test suite. Opening a repository directly doesn't cause ‘rep-cache.db’ to opened, nor does obtaining a log of all revisions. hgsubversion has two modes for fetching revisions; replay and diff-based. Hacking the tests to use one instead only affects how many repositories are processed before exhausting descriptors. For reference, I've attached the output of ‘lsof’ on a process running our test suite; both unfiltered and filtered for readability.

So, what to do now? I've discussed this with the Subvertpy author, Jelmer Vernooij, and he's at a loss as to what might cause this other than a bug in Subversion. I'd like to be able to diagnose this further, but so far, I haven't been able to get Subversion to open ‘rep-cache.db’ file. So I ask you guys: Do you think this is a bug in Subversion, or somewhere else? Do you have any hints on what I can do to diagnose this further? If it is a bug in Subversion, could it be fixed in the 1.6.x branch?

(I've Cc'ed this mail to Augie Fackler and Jelmer Vernooij, the maintainers of hgsubversion and Subvertpy, respectively.)

[1] http://code.google.com/p/hgsubversion/
[2] http://samba.org/~jelmer/subvertpy/

--
Dan Villiom Podlaski Christiansen
danchr_at_gmail.com


  • application/pkcs7-signature attachment: smime.p7s
Received on 2010-06-03 16:12:35 CEST

This is an archived mail posted to the Subversion Dev mailing list.