[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: File descriptor leak of rep-cache.db in 1.6.x

From: Greg Stein <gstein_at_gmail.com>
Date: Thu, 3 Jun 2010 11:06:42 -0400

On my phone, so sorry for a short reply.

A few weeks ago, a change was made to an FS test to clear a pool, to close
file descriptors. I was suspicious that it only fixed a symptom rather than
a real bug, and that seems correct.

I'd start investigation there.

On Jun 3, 2010 10:12 AM, "Dan Villiom Podlaski Christiansen" <
danchr_at_gmail.com> wrote:

Hi,

I'm experiencing a file descriptor leak in the Subversion 1.6.x branch. I'm
hit by it in the 1.6.1 build included with Mac OS X 10.6, a 1.6.11 build
from MacPorts and a build of the 1.6.x branch. The bug is not present in
1.5.7, nor in trunk.

The bug is somewhat subtle, and the circumstances causing it are fairly
complex. In the cause of running a test suite, we open repositories
repeatedly using the ‘file’ protocol, log their history and fetch the
contents of all revisions. This will fail after about a hundred tests,
having exhausted file descriptors. Inspecting the output of ‘lsof’ on the
process, there are 216 open references to ‘rep-cache.db’ files.

As mentioned, the circumstances causing the bug are fairly complex. I see
this in hgsubversion,[1] a Mercurial plugin for Subversion interoperability.
Both Mercurial and hgsubversion are written in Python, and historically,
hgsubversion has used the SWIG bindings for Subversion. Unfortunately, we
have found the SWIG bindings to leak like a sieve; it is not uncommon that
converting large repositories uses several gigabytes of memory or even
exhaust address space in a 32-bit environment. As an effort to fix this,
I've been writing a Subvertpy backend for hgsubversion. Subvertpy[2] is a
set of alternate Python bindings for Subversion, that expose a much less
complicated API, and — most importantly — deals with memory allocation
internally, rather than exposing it to the Python environment.

So far, the results are good; converting a test repository (cvs2svn) using
the HTTP or svn protocols is slower, but with significantly lower memory
use. (Some of that overhead just might be the cost of deallocating more; you
never know…) The file protocol, however, appears to leak somewhat, using 27%
less memory than when using the SWIG bindings, but use twice as much CPU
time. (Please note that I haven't tested this with anything other than 1.6.)
The file protocol is the main protocol used by the hgsubversion test suite.
Whereas leaking one file descriptor per repository is insignificant during
common use, our test suite opens hundreds of repositories in a single
process.

Considering the many packages involved, it's not easy to determine which one
might be buggy. A few observations:

* While hgsubversion leaks using the SWIG bindings, it doesn't leak file
descriptors. This suggests that it isn't the cause of *this* leak.
* It is quite possible that the source of the leak is in Subvertpy. In order
to get hgsubversion working using it, I had to add a few missing wrapper
APIs. However…
* Neither hgsubversion nor Subvertpy contain any logic related to the
various repository access methods. It seems likely that if either were the
cause of the bug, it should affect all access methods and not just one.
* From a brief inspection of the Subversion source code, it appears that the
‘rep-cache.db’ is an implementation detail deep in Subversion. It's odd that
this file remains open throughout the lifetime of the process. If the source
of the leak were higher up in the chain, wouldn't other files in the
repository remain open as well?
* Finally, there's the point that the leak isn't present when using
Subversion 1.5.7 or 1.7.x. Subvertpy uses slightly different code paths for
1.5.7, but for 1.7.x, the code used is exactly the same. hgsubversion
requires Subversion 1.5, and uses the same paths regardless of the
underlying version of Subversion.

I haven't been able to reproduce this outside our test suite. Opening a
repository directly doesn't cause ‘rep-cache.db’ to opened, nor does
obtaining a log of all revisions. hgsubversion has two modes for fetching
revisions; replay and diff-based. Hacking the tests to use one instead only
affects how many repositories are processed before exhausting descriptors.
For reference, I've attached the output of ‘lsof’ on a process running our
test suite; both unfiltered and filtered for readability.

So, what to do now? I've discussed this with the Subvertpy author, Jelmer
Vernooij, and he's at a loss as to what might cause this other than a bug in
Subversion. I'd like to be able to diagnose this further, but so far, I
haven't been able to get Subversion to open ‘rep-cache.db’ file. So I ask
you guys: Do you think this is a bug in Subversion, or somewhere else? Do
you have any hints on what I can do to diagnose this further? If it is a bug
in Subversion, could it be fixed in the 1.6.x branch?

(I've Cc'ed this mail to Augie Fackler and Jelmer Vernooij, the maintainers
of hgsubversion and Subvertpy, respectively.)

[1] http://code.google.com/p/hgsubversion/
[2] http://samba.org/~jelmer/subvertpy/

--
Dan Villiom Podlaski Christiansen
danchr_at_gmail.com
Received on 2010-06-03 17:07:18 CEST

This is an archived mail posted to the Subversion Dev mailing list.