[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] fix for ssh zombies introduced with r35533

From: Kyle McKay <mackyle_at_gmail.com>
Date: Sun, 19 Apr 2009 19:45:17 -0700

Gavin,

I cannot duplicate the problem described by James using the released
version of Subversion 1.6.1 with the patch I attached to issue 2580
applied:

http://subversion.tigris.org/issues/show_bug.cgi?id=2580

Using the technique in his thread you reference below (assuming that
ssh connection pooling has been enabled as previously described), I
tried two scenarios:

1. Run the test script and ssh into the machine hosting the repository
while the test script is running. It might take a few tries to get
the secondary ssh to pick up the shared connection (because it will be
torn down and re-setup in between each svn log connection), but it can
be done.

In this case you can see the ssh svnserve -t process hanging around to
support the ssh login even after the test script has stopped. When
you exit the ssh login you see a message "Shared connection to ...
closed." This is confirmation that the ssh login was using a shared
connection created by the svn log test.

At this point the ssh svnserve -t process that was hanging around to
support the ssh login has exited for me.

2. ssh into the machine hosting the repository BEFORE running the test
script. Run the test script (notice how much faster it runs) all the
svn log commands appear to be using the shared connection. Now exit
the ssh login.

The login exits cleanly for me and does not leave any ssh process
hanging around.

Admittedly both of these tests were run on Mac OS X 10.5.6 (which
claims to be an Open Brand UNIX 03 Registered Product, conforming to
the SUSv3 and POSIX 1003.1 specifications for the C API, Shell
Utilities, and Threads) -- I don't know what system James was using
and it may behave differently.

Probably the only way to make everyone happy is to do the following:

A. Add a configuration item to the [tunnels] section

tunnel-agent-handling = kill | term | detach

Where the default is "kill" if not specified. This would be the
behavior (and code path) identical to svn version 1.5.6 and earlier
(APR_KILL_ALWAYS). "term" would have the identical code path to
"kill" except use APR_KILL_AFTER_TIMEOUT instead. "detach" would
modify the behavior as follows: For WIN32 systems APR_KILL_NEVER is
used while on other systems APR_KILL_ALWAYS is used AND the additional
code added by the patch I posted to issue 2580 becomes active (in the
other cases, "kill" and "term", that code would be inactive).

B. Allow the ./configure script to change the default for tunnel-agent-
handling if not specified from "kill" to either of the other options.

This would achieve the following:

1. 1.5.6 and earlier behavior restored as the default
2. No zombies created, ever
3. Debian can stop patching the svn distribution and just use a ./
configure option instead

I might be willing to do this work provided one of the committers
stands up and publicly commits to seeing the patches actually
submitted to the repository (as patch manager perhaps you could do
this?). While the svn project is indeed open source, it is not open
to external commits and without a public commitment from one of the
internal folks to see the patches are actually applied to trunk, I
can't see spending any time on creating them if they're just going to
sit around unapplied.

Kyle

P.S. Connection pooling is still possible in svn 1.5.6 and earlier on
unix-like systems without premature connection termination using the
simple workaround described here:

http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1452200

On Apr 19, 2009, at 16:51, Gavin 'Beau' Baumanis wrote:
> Hi Kyle,
>
> I have noticed that your email with patch, is dated March 30, 2009
>
> and the following thread by James Knight:
>
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1523947
>
> Is dated April 02, 2009.
>
> I can't quite follow if James' reply is in response to your patch
> submission in THIS thread or the workaround suggestions in the above
> hyperlink.
>
> So, Can you tell me if your patch is still valid, despite James'
> comments?
> Does it require more work / comments from the list?
> Is it no longer valid as a result of James' comments and, so at
> least for the time being, there is nothing more you would like /
> need me to do with your patch (as Patch Manager)?
>
>
> Gavin.
>
>
> On 29/03/2009, at 10:04 AM, Kyle McKay wrote:
>
>> [[[
>> Fix ssh zombie problem introduced with revision 35533
>>
>> * subversion/libsvn_ra_svn/client.c
>> (make_tunnel): Fully detach tunnel process to avoid having it
>> receive signals while restoring the original
>> apr_pool_note_subprocess to avoid creating zombies.
>> ]]]
>>
>> As detailed in this thread:
>>
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1445592
>>
>> regarding the change introduced into client.c by revision 35533:
>>
>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>
>> On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
>>> The case which drove r35533 was a user who uses ssh connection
>>> pooling for svn connections.
>>
>> Change r35533 removed the call to to apr_pool_note_subprocess meaning
>> that Subversion never reaps any of its children. This may be okay
>> for
>> a short-lived process, but is not okay for a long-lived process such
>> as a GUI tool that has linked with the Subversion library.
>>
>> If the GUI tool runs long enough, it can create so many un-reaped
>> zombie processes that system resources are exhausted and it becomes
>> impossible to spawn any new processes.
>>
>> The attached patch restores the original call to
>> apr_pool_note_subprocess thus guaranteeing no zombies are ever
>> created. It also arranges for the tunnel process to become detached
>> so that it can "die in piece, in its own time, on its own terms" as
>> the log comment for revision 35533 mentions, but without causing
>> zombie processes to be created.
>>
>> This change retains full compatibility with ssh connection pooling
>> while eliminating the zombie problem. See issue #2580 for the
>> original impetus for the change made in revision 35533.
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2580
>>
>> Kyle
>>
>> ------------------------------------------------------
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1462473
>> <client.c-patch>
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1816867

To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-04-20 04:46:52 CEST

This is an archived mail posted to the Subversion Users mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.