On Mar 27, 2009, at 09:51, Hyrum K. Wright wrote:
> On Mar 27, 2009, at 11:21 AM, Kyle McKay wrote:
>
>> From the xcode-users mailing list:
>>
>>> From: Chris Espinosa <cde_at_apple.com>
>>> Date: March 26, 2009 15:39:14 PDT
>>> To: Xcode Users <xcode-users_at_lists.apple.com>
>>> Subject: Re: Xcode 3.1.2 and Subversion 1.6
>>>
>>> On Mar 25, 2009, at 3:19 PM, Chris Espinosa wrote:
>>>> On Mar 25, 2009, at 3:07 PM, Rob Lockstone wrote:
>>>>
>>>>> Has anyone tried using Xcode 3.1.2 with the subversion 1.6.0
>>>>> client? I think I recall (but may be wrong) that newer versions of
>>>>> Xcode don't make assumptions about the version of subversion
>>>>> that's installed and simply use whatever version it finds.
>>>>
>>>> We have not yet qualified any version of Xcode with Subversion
>>>> 1.6.0 and don't recommend replacing existing Subversion library or
>>>> client code with 1.6 until we've given it the green light.
>>>
>>> We've discovered in internal testing that this patch in Subversion
>>> 1.6:
>>>
>>> http://svn.collab.net/viewvc/svn?view=revision&revision=35533
>>>
>>> can cause Subversion 1.6 to leave behind zombie ssh processes every
>>> time you save a file in Xcode, and eventually exhaust your ability
>>> to spawn new processes.  We don't recommend using Subversion 1.6
>>> with Xcode 3.1.x at this time.
>>>
>>> Chris
>>
>> From:
>>
>> http://svn.apache.org/repos/asf/apr/apr/tags/1.0.0/include/apr_thread_proc.h
>>
>> APR_KILL_NEVER         // process is never sent any signals
>> APR_KILL_ALWAYS   // process is sent SIGKILL on apr_pool_t cleanup
>> APR_KILL_AFTER_TIMEOUT // SIGTERM, wait 3 seconds, SIGKILL
>> APR_JUST_WAIT          // wait forever for the process to complete
>> APR_KILL_ONLY_ONCE     // send SIGTERM and then wait
>>
>> Restoring the apr_pool_note_subprocess and using APR_KILL_NEVER would
>> allow the children to be reaped provided they exit before pool  
>> cleanup.
>>
>> However, that would likely not eliminate the zombie problem in Xcode
>> as pool cleanup probably happens faster than ssh cleanup and exit in
>> some cases.  How about using APR_KILL_AFTER_TIMEOUT or
>> APR_KILL_ONLY_ONCE (or even APR_JUST_WAIT) ?
>
> How would this interact with ssh connection pooling?  The case which  
> drove r35533 was a user who uses ssh connection pooling for svn  
> connections.  Having svn kill the ssh connection is obviously  
> hazardous to such a scheme, how would using the other APR_KILL_*  
> conditions behave there (and would they fix the problem with XCode)?
>
> -Hyrum
I used the following configuration in ~/.ssh/config for the following  
TESTs:
ControlMaster auto
ControlPath /tmp/sshpool-%l-%r@%h:%p
Also ~/.ssh/id_rsa.pub key has been added to ~/.ssh/authorized_keys so  
that ssh localhost works without requiring any passwords.
It seems that the master and slave ssh connections are only related  
through the unix socket.  So when the master exits for whatever reason  
(SIGHUP, SIGINT, SIGTERM, SIGKILL etc.) the unix socket goes away and  
all the slave ssh connections die.
------
TEST 1
------
Suppose you do the following (with ssh configured as mentioned above):
ssh localhost sleep 15
And wait 15 seconds.  The ssh process exits normally.
------
TEST 2
------
Do this:
ssh localhost sleep 15
And within 15 seconds, go to another window/terminal/screen and do this:
ssh -t localhost top # The "-t" option is important here
If you go back to the first window, you'll notice that even after 15  
seconds the first ssh process doesn't exit.  Go back to the window  
running top and press "q".  You should see a message about "Shared  
connection to localhost closed." and if you now go back to the first  
ssh process, you'll see that it has exited normally.
------
TEST 3
------
Do this:
ssh localhost sleep 15
And within 15 seconds, go to another window/terminal/screen and do this:
ssh localhost top # DO NOT USE "-t" THIS TIME
If you go back to the first window, you'll notice that even after 15  
seconds the first ssh process doesn't exit.  Go back to the window  
running top and press Ctrl-C.  The second ssh process should exit, but  
you will not see the message about closing the shared connection.  If  
you go back to the first ssh process, you'll see that it's still  
waiting to exit -- the abnormal exit of the second ssh process  
prevents the first (master) ssh process from noticing and it will  
never close now unless you send it one of SIGHUP, SIGINT, SIGTERM,  
SIGKILL etc.
-----------
CONCLUSIONS
-----------
1. For ssh connection pooling with ControlMaster/ControlPath to work,  
the ssh master process must not receive any kind of hup/interrupt/quit/ 
terminate signal.
2. If any of the slave ssh connections fail to exit normally, the  
master ssh connection will never exit without some kind of signal sent  
to it.
3. ASSUMPTION: The probability of some slave ssh connection exiting  
abnormally is low, but > 0.
3. ASSUMPTION: The probability of Subversion's ssh connection being  
the master connection is > 0.
4. Given #1, #2, #3, and #4 Subversion will sooner or later leave  
behind a running ssh master connection.
5. Omitting the apr_pool_note_subprocess call prevents Subversion from  
reaping its children.  This will ALWAYS result in zombies unless there  
has been a signal(SIGCHLD, SIG_IGN) call previously (such a call would  
be highly unfriendly to any program linked with the Subversion library).
6. When Subversion is accessed directly via the API from the  
Subversion library, it may be part of a long-running process that  
persists across many Subversion operations.
7. Given #5 and #6 the number of zombies created may eventually  
overwhelm the system resources and exhaust your ability to spawn new  
processes (this is what's happening with Xcode, but could happen to  
any long-running program that links with the Subversion library).
8. The only apr_pool_note_subprocess option that does not send any  
signals is APR_KILL_NEVER.  However it will still reap zombie children  
provided they have exited by pool cleanup time.
9. ASSUMPTION: The probability of svn's ssh connection exiting after  
the pool cleanup is low, but > 0.
10. Given #8 and #9 svn will still probably create a few zombies even  
with APR_KILL_NEVER but likely this will be a far smaller number than  
without any apr_pool_note_subprocess call at all.
11. You can't have it both ways (never create zombies or stray ssh  
processes AND support ssh connection pooling) without some kind of  
configuration option as the required signaling behavior is mutually  
exclusive.
---------------
RECOMMENDATIONS
---------------
1. Short term.  Revert change 35533 and then change the  
APR_KILL_ALWAYS to an APR_KILL_NEVER.  This will likely eliminate most  
of the zombie problems for long-lived processes using the Subversion  
library while remaining compatible with ssh connection pooling  
(ControlMaster/ControlPath).
2. Longer term. Add an option to ~/.subversion/config file ([tunnels]  
section?) that lets you select the apr_kill_conditions_e value passed  
to apr_pool_note_subprocess with it defaulting to APR_KILL_NEVER if  
not given (apr-kill-condition = ... ?).
Kyle
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1447378
To unsubscribe from this discussion, e-mail: [users-unsubscribe_at_subversion.tigris.org].
Received on 2009-03-27 20:04:53 CET