I did read this entire thread ... the problem is the linux sshd/child
process, per http://tools.ietf.org/html/rfc1122#page-87, see section
4.2.2.13.
Just for kicks, also see
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_termin
ation
Under this situation, I would not advocate changing the windows
implementation to "correct" the mis-behavior of the linux sshd and its child
process.
Also, to quote your original post -- "sshd goes into bozo mode and leaves
its svnserve subprocess running." I haven't looked into the sources, but if
I recall correctly, sshd under this circumstance is sending "SIGHUP" to the
child process (svnserve), giving it the opportunity to flush buffers ...
which would mean that svnserve is not responding correctly to external
signals.
-Paul
-----Original Message-----
From: Bob Denny [mailto:rdenny_at_dc3.com]
Sent: Thursday, October 15, 2009 9:38 AM
To: dev_at_subversion.tigris.org
Subject: Issue #2580 revisited: Windows unclean TCP close [SEVERE]
SVN 1.6.5 on Windows and using svn+ssh fails to properly complete and close
the TCP connection. leaving the remote sshd/svnserve processes running
(CentOS Linux server). These last "forever", accumulating until the sysop
disables your SSH access :-) This is a severe problem. It doesn't always
appear on slower CPUs with fast connections to the remote svn+ssh repo.
However, my system is a 2.6GHz quadcore, and the remote repo is across the
public Internet. This is the worst case environment as you'll see. I am
certain this problem is not unique to me.
I'm looking for a "buddy" on this. I am a TortoiseSVN user. TortoiseSVN
shares much code with Subversion. I have checked out the latest TortoiseSVN
trunk (which includes Subversion 1.6.5 as an external) and built it. I hope
I won't have to go through this again for the Subversion sources in order to
achieve the credibility I need here...
It is definitely an issue with Subversion, specifically libsvn_ra_svn and
its usage of apr. As of SVN 1.6.5, the svn client forcibly kills the tunnel
subprocess as soon as it receives the last of the data that it expects. This
prevents the tunnel proc from completing its conversation with the remote
sshd, leaving it and its child svnserve haning there for up to a half hour.
In my case the tunnel is TortoisePLink, but it also happens with the OpenSSH
'ssh.exe' tunnel, for the same reason.
Specifically, in libsvn_ra_svn\client.c, a call is made
apr_pool_note_subprocess(pool, proc, APR_KILL_ONLY_ONCE);
Inspection of the apr sources reveals that, on Windows, the only significant
kill modes are APR_KILL_ALWAYS and APR_KILL_NEVER. Any other modes (e.g.
APR_KILL_ONCE) are translated on Windows to APR_KILL_ALWAYS, resulting in
the svn client doing an immediate TerminateProcess() on the tunnel,
preventing it from cleanly closing the TCP connection.
Here is the tail of a packet trace showing the unclean close:
43 14.439453 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
44 14.439453 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4206 ...
45 14.529297 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
46 14.529297 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4206 ...
47 14.611328 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
48 14.613281 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4206 ...
49 14.693359 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
50 14.714844 bob.dc3.com svn.dc3.com TCP:Flags=...A.R.., SrcPort=4206 ...
The 'R' flag means a reset packet. The remote sshd tried to send an ACK to
me, and my tunnel was killed, so my TCP stack sent back this reset packet
saying "forget it, this TCP connection is gone". sshd goes into bozo mode
and leaves its svnserve subprocess running.
A tail of the tunnel packet log (from an 'svn ls' command) looks like this:
00000210 29 20 28 20 37 3a 69 6e 73 74 72 65 67 20 64 69 ) ( 7:instreg
di
00000220 72 20 30 20 66 61 6c 73 65 20 30 20 28 20 32 37 r 0 false 0 (
27
00000230 3a 31 39 37 30 2d 30 31 2d 30 31 54 30 30 3a 30
:1970-01-01T00:0
00000240 30 3a 30 30 2e 30 30 30 30 30 30 5a 20 29 20 28 0:00.000000Z )
(
00000250 20 29 20 29 20 28 20 37 3a 63 6c 61 70 61 63 6b ) ) (
7:clapack
00000260 20 64 69 72 20 30 20 66 61 6c 73 65 20 30 20 28 dir 0 false 0
(
00000270 20 32 37 3a 31 39 37 30 2d 30 31 2d 30 31 54 30
27:1970-01-01T0
00000280 30 3a 30 30 3a 30 30 2e 30 30 30 30 30 30 5a 20 0:00:00.000000Z
00000290 29 20 28 20 29 20 29 20 29 20 29 20 29 20 ) ( ) ) ) ) )
Outgoing packet type 96 / 0x60 (SSH2_MSG_CHANNEL_EOF)
00000000 00 00 00 00 ....
I changed the line of code in client.c to
apr_pool_note_subprocess(pool, proc, APR_KILL_NEVER);
and the problem is cured. Here is the tail of a packet trace showing the
proper TCP close:
42 13.656250 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
43 13.656250 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4197 ...
44 13.734375 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
45 13.734375 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4197 ...
46 13.812500 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
47 13.828125 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4197 ...
48 13.906250 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
49 13.906250 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4197 ...
50 13.906250 bob.dc3.com svn.dc3.com TCP:Flags=...A...F, SrcPort=4197 ...
51 13.984375 svn.dc3.com bob.dc3.com TCP:Flags=...A...., SrcPort=7822 ...
52 13.984375 svn.dc3.com bob.dc3.com TCP:Flags=...A...., SrcPort=7822 ...
53 13.984375 svn.dc3.com bob.dc3.com TCP:Flags=...A...F, SrcPort=7822 ...
54 13.984375 bob.dc3.com svn.dc3.com TCP:Flags=...A...., SrcPort=4197 ...
You can see the tunnel and the remote sshd exchanging FIN packets. And here
is the tail of the tunnel packet log for the svn ls command:
00000210 29 20 28 20 37 3a 69 6e 73 74 72 65 67 20 64 69 ) ( 7:instreg
di
00000220 72 20 30 20 66 61 6c 73 65 20 30 20 28 20 32 37 r 0 false 0 (
27
00000230 3a 31 39 37 30 2d 30 31 2d 30 31 54 30 30 3a 30
:1970-01-01T00:0
00000240 30 3a 30 30 2e 30 30 30 30 30 30 5a 20 29 20 28 0:00.000000Z )
(
00000250 20 29 20 29 20 28 20 37 3a 63 6c 61 70 61 63 6b ) ) (
7:clapack
00000260 20 64 69 72 20 30 20 66 61 6c 73 65 20 30 20 28 dir 0 false 0
(
00000270 20 32 37 3a 31 39 37 30 2d 30 31 2d 30 31 54 30
27:1970-01-01T0
00000280 30 3a 30 30 3a 30 30 2e 30 30 30 30 30 30 5a 20 0:00:00.000000Z
00000290 29 20 28 20 29 20 29 20 29 20 29 20 29 20 ) ( ) ) ) ) )
Outgoing packet type 96 / 0x60 (SSH2_MSG_CHANNEL_EOF)
00000000 00 00 00 00 ....
Event Log: Sent EOF message
Incoming packet type 96 / 0x60 (SSH2_MSG_CHANNEL_EOF)
00000000 00 00 01 00 ....
Incoming packet type 98 / 0x62 (SSH2_MSG_CHANNEL_REQUEST)
00000000 00 00 01 00 00 00 00 0b 65 78 69 74 2d 73 74 61
........exit-sta
00000010 74 75 73 00 00 00 00 00 tus.....
Event Log: Server sent command exit status 0
Incoming packet type 97 / 0x61 (SSH2_MSG_CHANNEL_CLOSE)
00000000 00 00 01 00 ....
Outgoing packet type 97 / 0x61 (SSH2_MSG_CHANNEL_CLOSE)
00000000 00 00 00 00 ....
Event Log: Disconnected: All channels closed
You can see that the tunnel was allowed to complete its activity and exit
through its normal paths. The remote sshd/svnserve processes exit cleanly as
well. Problem solved!
RECOMMENDATIONS:
The cleanest and safest way to handle this would seem to be:
1. Modify apr to support the kill mode APR_KILL_AFTER_TIMEOUT *on Windows*.
This would cause the tunnel to be killed after three seconds, presumably
plenty of time.
2. Modify libsvn_ra_svn\client.c to use APR_KILL_AFTER_TIMEOUT *on Windows*
instead of APR_KILL_ONCE.
However, patching apr is probably not acceptable, so instead:
1. Modify libsvn_ra_svn\client.c to use APR_KILL_NEVER *on Windows* instead
of APR_KILL_ONCE.
What do you think?
-- Bob Denny
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=240
7949
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2410074
Received on 2009-10-22 07:34:15 CEST