[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Issue #2580 revisited: Windows unclean TCP close [SEVERE]

From: Bob Denny <rdenny_at_dc3.com>
Date: Thu, 15 Oct 2009 09:38:23 -0700

SVN 1.6.5 on Windows and using svn+ssh fails to properly complete and close the TCP connection. leaving the remote sshd/svnserve processes running (CentOS Linux server). These last "forever", accumulating until the sysop disables your SSH access :-) This is a severe problem. It doesn't always appear on slower CPUs with fast connections to the remote svn+ssh repo. However, my system is a 2.6GHz quadcore, and the remote repo is across the public Internet. This is the worst case environment as you'll see. I am certain this problem is not unique to me.

I'm looking for a "buddy" on this. I am a TortoiseSVN user. TortoiseSVN shares much code with Subversion. I have checked out the latest TortoiseSVN trunk (which includes Subversion 1.6.5 as an external) and built it. I hope I won't have to go through this again for the Subversion sources in order to achieve the credibility I need here...

It is definitely an issue with Subversion, specifically libsvn_ra_svn and its usage of apr. As of SVN 1.6.5, the svn client forcibly kills the tunnel subprocess as soon as it receives the last of the data that it expects. This prevents the tunnel proc from completing its conversation with the remote sshd, leaving it and its child svnserve haning there for up to a half hour. In my case the tunnel is TortoisePLink, but it also happens with the OpenSSH 'ssh.exe' tunnel, for the same reason.

Specifically, in libsvn_ra_svn\client.c, a call is made

apr_pool_note_subprocess(pool, proc, APR_KILL_ONLY_ONCE);

Inspection of the apr sources reveals that, on Windows, the only significant kill modes are APR_KILL_ALWAYS and APR_KILL_NEVER. Any other modes (e.g. APR_KILL_ONCE) are translated on Windows to APR_KILL_ALWAYS, resulting in the svn client doing an immediate TerminateProcess() on the tunnel, preventing it from cleanly closing the TCP connection.

Here is the tail of a packet trace showing the unclean close:

43 14.439453 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
44 14.439453 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4206 ...
45 14.529297 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
46 14.529297 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4206 ...
47 14.611328 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
48 14.613281 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4206 ...
49 14.693359 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
50 14.714844 bob.dc3.com svn.dc3.com TCP:Flags=...A.R.., SrcPort=4206 ...

The 'R' flag means a reset packet. The remote sshd tried to send an ACK to me, and my tunnel was killed, so my TCP stack sent back this reset packet saying "forget it, this TCP connection is gone". sshd goes into bozo mode and leaves its svnserve subprocess running.

A tail of the tunnel packet log (from an 'svn ls' command) looks like this:

  00000210 29 20 28 20 37 3a 69 6e 73 74 72 65 67 20 64 69 ) ( 7:instreg di
  00000220 72 20 30 20 66 61 6c 73 65 20 30 20 28 20 32 37 r 0 false 0 ( 27
  00000230 3a 31 39 37 30 2d 30 31 2d 30 31 54 30 30 3a 30 :1970-01-01T00:0
  00000240 30 3a 30 30 2e 30 30 30 30 30 30 5a 20 29 20 28 0:00.000000Z ) (
  00000250 20 29 20 29 20 28 20 37 3a 63 6c 61 70 61 63 6b ) ) ( 7:clapack
  00000260 20 64 69 72 20 30 20 66 61 6c 73 65 20 30 20 28 dir 0 false 0 (
  00000270 20 32 37 3a 31 39 37 30 2d 30 31 2d 30 31 54 30 27:1970-01-01T0
  00000280 30 3a 30 30 3a 30 30 2e 30 30 30 30 30 30 5a 20 0:00:00.000000Z
  00000290 29 20 28 20 29 20 29 20 29 20 29 20 29 20 ) ( ) ) ) ) )
Outgoing packet type 96 / 0x60 (SSH2_MSG_CHANNEL_EOF)
  00000000 00 00 00 00 ....

I changed the line of code in client.c to

apr_pool_note_subprocess(pool, proc, APR_KILL_NEVER);

and the problem is cured. Here is the tail of a packet trace showing the proper TCP close:

42 13.656250 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
43 13.656250 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4197 ...
44 13.734375 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
45 13.734375 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4197 ...
46 13.812500 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
47 13.828125 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4197 ...
48 13.906250 svn.dc3.com bob.dc3.com TCP:Flags=...AP..., SrcPort=7822 ...
49 13.906250 bob.dc3.com svn.dc3.com TCP:Flags=...AP..., SrcPort=4197 ...
50 13.906250 bob.dc3.com svn.dc3.com TCP:Flags=...A...F, SrcPort=4197 ...
51 13.984375 svn.dc3.com bob.dc3.com TCP:Flags=...A...., SrcPort=7822 ...
52 13.984375 svn.dc3.com bob.dc3.com TCP:Flags=...A...., SrcPort=7822 ...
53 13.984375 svn.dc3.com bob.dc3.com TCP:Flags=...A...F, SrcPort=7822 ...
54 13.984375 bob.dc3.com svn.dc3.com TCP:Flags=...A...., SrcPort=4197 ...

You can see the tunnel and the remote sshd exchanging FIN packets. And here is the tail of the tunnel packet log for the svn ls command:

  00000210 29 20 28 20 37 3a 69 6e 73 74 72 65 67 20 64 69 ) ( 7:instreg di
  00000220 72 20 30 20 66 61 6c 73 65 20 30 20 28 20 32 37 r 0 false 0 ( 27
  00000230 3a 31 39 37 30 2d 30 31 2d 30 31 54 30 30 3a 30 :1970-01-01T00:0
  00000240 30 3a 30 30 2e 30 30 30 30 30 30 5a 20 29 20 28 0:00.000000Z ) (
  00000250 20 29 20 29 20 28 20 37 3a 63 6c 61 70 61 63 6b ) ) ( 7:clapack
  00000260 20 64 69 72 20 30 20 66 61 6c 73 65 20 30 20 28 dir 0 false 0 (
  00000270 20 32 37 3a 31 39 37 30 2d 30 31 2d 30 31 54 30 27:1970-01-01T0
  00000280 30 3a 30 30 3a 30 30 2e 30 30 30 30 30 30 5a 20 0:00:00.000000Z
  00000290 29 20 28 20 29 20 29 20 29 20 29 20 29 20 ) ( ) ) ) ) )
Outgoing packet type 96 / 0x60 (SSH2_MSG_CHANNEL_EOF)
  00000000 00 00 00 00 ....
Event Log: Sent EOF message
Incoming packet type 96 / 0x60 (SSH2_MSG_CHANNEL_EOF)
  00000000 00 00 01 00 ....
Incoming packet type 98 / 0x62 (SSH2_MSG_CHANNEL_REQUEST)
  00000000 00 00 01 00 00 00 00 0b 65 78 69 74 2d 73 74 61 ........exit-sta
  00000010 74 75 73 00 00 00 00 00 tus.....
Event Log: Server sent command exit status 0
Incoming packet type 97 / 0x61 (SSH2_MSG_CHANNEL_CLOSE)
  00000000 00 00 01 00 ....
Outgoing packet type 97 / 0x61 (SSH2_MSG_CHANNEL_CLOSE)
  00000000 00 00 00 00 ....
Event Log: Disconnected: All channels closed

You can see that the tunnel was allowed to complete its activity and exit through its normal paths. The remote sshd/svnserve processes exit cleanly as well. Problem solved!

RECOMMENDATIONS:

The cleanest and safest way to handle this would seem to be:

1. Modify apr to support the kill mode APR_KILL_AFTER_TIMEOUT *on Windows*. This would cause the tunnel to be killed after three seconds, presumably plenty of time.

2. Modify libsvn_ra_svn\client.c to use APR_KILL_AFTER_TIMEOUT *on Windows* instead of APR_KILL_ONCE.

However, patching apr is probably not acceptable, so instead:

1. Modify libsvn_ra_svn\client.c to use APR_KILL_NEVER *on Windows* instead of APR_KILL_ONCE.

What do you think?

  -- Bob Denny

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2407949
Received on 2009-10-16 02:27:05 CEST

This is an archived mail posted to the Subversion Dev mailing list.