[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Avoinding file handle leak using the Python bindings & core.Stream

From: Trent Nelson <trent_at_snakebite.org>
Date: Tue, 17 Apr 2012 13:41:25 -0700

On Apr 16, 2012, at 5:29 AM, Willmer, Alex (PTS) wrote:

> Hello,
>
> I've been working on an full text search plugin for Trac. At initial setup this indexes the entire Subversion repository by reading every node of every version. During testing we discovered that the indexer was running out of file handles, due to a file handle leak. As far as I can tell each core.Stream(fs.file_contents(.)) instance that was created and not subsequently .read() left an unclosed file handle. To work around this I have monkey patched a Stream.close() method that calls svn_stream_close, which is used in a try/finally block.

Any chance you could post more of your code? I'm interested in your main loop and all Subversion binding calls -- feel free to omit the full-text indexing logic details. If it's company code that you can't share, uh, feel free to accidentally send it to me privately or whip up some pseudo code that shows the binding calls ;-)

Side bar: I've found the ``psutil`` helpful in the past for tracking down file handle leaks:

        import os
        import psutil
        
        def dump_open_files(p):
                print "open files: ", p.get_open_files()

        def main():
                p = psutil.Process(os.getpid())
                for i in xrange(0, svn.fs.youngest_rev(repo)):
                        print p.get_open_files()
                        ... open stream
                        ... index stream
                        print p.get_open_files()
                        ... close stream
                        print p.get_open_files()
                
        
You get the idea. If there's a leak you should see a direct correlation between whatever stream operations you're doing and what's getting reported by p.get_open_files().

Can I just clarify you're using the SWIG bindings and not the ctypes-based ones, too?

(I ran into a peculiar memory leak a few months back; I had similar code that essentially analyzed every revision in a repository. So, it's not unfathomable that there could be a leak.)

>
> The work-around has fixed our file-handle leak for, but I believe it points to a bug in the Subversion bindings for which I'll try and provide a patch. Before I file a bug I'd like to check I haven't misunderstood anything:
> 1. In the Python bindings core.Stream doesn't have a .close() method [a]. Is there any reason this might be intentional?

It's bed time so I'm not going to look at the source right now -- but, if streams are wrapped in the apr_pool weakref black magic, then yeah, .close() could be happening behind the scenes when an object gets garbage collected.

Are you noticing constant memory usage whilst analyzing the repo or does that grow as well?

> 2. Disregarding Python, in the Subversion library is it required that every svn_stream_t created (by eg a call to svn_fs_file_contents) is explicitly closed, or is there some automatic clean-up/closure provided by the pool system?
>

Again, not looking at the code, I'd be inclined to blame the Python bindings, not the Subversion libraries. I'd wager that Subversion takes care of cleaning itself up when the stream's pool gets destroyed. It's pretty good like that.

        Trent.
Received on 2012-04-17 22:42:02 CEST

This is an archived mail posted to the Subversion Users mailing list.