Issue #2580 revisited: Windows unclean TCP close [SEVERE]
From: Bob Denny <rdenny_at_dc3.com>
Date: Thu, 15 Oct 2009 09:38:23 -0700
SVN 1.6.5 on Windows and using svn+ssh fails to properly complete and close the TCP connection. leaving the remote sshd/svnserve processes running (CentOS Linux server). These last "forever", accumulating until the sysop disables your SSH access :-) This is a severe problem. It doesn't always appear on slower CPUs with fast connections to the remote svn+ssh repo. However, my system is a 2.6GHz quadcore, and the remote repo is across the public Internet. This is the worst case environment as you'll see. I am certain this problem is not unique to me.
I'm looking for a "buddy" on this. I am a TortoiseSVN user. TortoiseSVN shares much code with Subversion. I have checked out the latest TortoiseSVN trunk (which includes Subversion 1.6.5 as an external) and built it. I hope I won't have to go through this again for the Subversion sources in order to achieve the credibility I need here...
It is definitely an issue with Subversion, specifically libsvn_ra_svn and its usage of apr. As of SVN 1.6.5, the svn client forcibly kills the tunnel subprocess as soon as it receives the last of the data that it expects. This prevents the tunnel proc from completing its conversation with the remote sshd, leaving it and its child svnserve haning there for up to a half hour. In my case the tunnel is TortoisePLink, but it also happens with the OpenSSH 'ssh.exe' tunnel, for the same reason.
Specifically, in libsvn_ra_svn\client.c, a call is made
apr_pool_note_subprocess(pool, proc, APR_KILL_ONLY_ONCE);
Inspection of the apr sources reveals that, on Windows, the only significant kill modes are APR_KILL_ALWAYS and APR_KILL_NEVER. Any other modes (e.g. APR_KILL_ONCE) are translated on Windows to APR_KILL_ALWAYS, resulting in the svn client doing an immediate TerminateProcess() on the tunnel, preventing it from cleanly closing the TCP connection.
Here is the tail of a packet trace showing the unclean close:
43 14.439453 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
The 'R' flag means a reset packet. The remote sshd tried to send an ACK to me, and my tunnel was killed, so my TCP stack sent back this reset packet saying "forget it, this TCP connection is gone". sshd goes into bozo mode and leaves its svnserve subprocess running.
A tail of the tunnel packet log (from an 'svn ls' command) looks like this:
00000210 29 20 28 20 37 3a 69 6e 73 74 72 65 67 20 64 69 ) ( 7:instreg di
I changed the line of code in client.c to
apr_pool_note_subprocess(pool, proc, APR_KILL_NEVER);
and the problem is cured. Here is the tail of a packet trace showing the proper TCP close:
42 13.656250 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
You can see the tunnel and the remote sshd exchanging FIN packets. And here is the tail of the tunnel packet log for the svn ls command:
00000210 29 20 28 20 37 3a 69 6e 73 74 72 65 67 20 64 69 ) ( 7:instreg di
You can see that the tunnel was allowed to complete its activity and exit through its normal paths. The remote sshd/svnserve processes exit cleanly as well. Problem solved!
RECOMMENDATIONS:
The cleanest and safest way to handle this would seem to be:
1. Modify apr to support the kill mode APR_KILL_AFTER_TIMEOUT *on Windows*. This would cause the tunnel to be killed after three seconds, presumably plenty of time.
2. Modify libsvn_ra_svn\client.c to use APR_KILL_AFTER_TIMEOUT *on Windows* instead of APR_KILL_ONCE.
However, patching apr is probably not acceptable, so instead:
1. Modify libsvn_ra_svn\client.c to use APR_KILL_NEVER *on Windows* instead of APR_KILL_ONCE.
What do you think?
-- Bob Denny
------------------------------------------------------
|
This is an archived mail posted to the Subversion Dev mailing list.
This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.