[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Issue #2580 revisited: Windows unclean TCP close [SEVERE]

From: Bob Denny <rdenny_at_dc3.com>
Date: Thu, 22 Oct 2009 05:47:00 -0700 (PDT)

Paul --

> I did read this entire thread ... the problem is the linux sshd/child
> process, per http://tools.ietf.org/html/rfc1122#page-87, see section
> 4.2.2.13.
> Just for kicks, also see
> http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_termin
> ation

I'm well aware of these issues, particularly the TCP connection termination process. I may not have been clear enough before: by instantly killing the tunnel process, the proper termination of the TCP connection is prevented (not to mention the proper termination of the SVN protocol!). The Ethernet traces I posted show the node where the tunnel USED to be running sending a TCP RESET packet. The TCP connection was IMPROPERLY terminated due to the tunnel (the client end of the TCP connection) vaporizing before the process could run to clean completion.

> Under this situation, I would not advocate changing the windows
> implementation to "correct" the mis-behavior of the linux sshd and its child
> process.

Do I understand you to be saying that it's sshd's error? How can it tell the difference between "dead partner" and "temporarily slow or dead net connection"? All that TCP stuff is running a couple of layers down under the sockets API that sshd uses. So sshd properly sits there for "a while", with its child svnserve also sitting there, both waiting for some word from the other end. Eventually (10+ MINUTES later) sshd DOES exit and terminate its child. But this process is one of "forget it, something's really wrong", I'm outa here...

Instantly killing the tunnel (clearly the WRONG way to terminate a TCP connection!) sets off a chain of events, ALL OF WHICH ARE ERRORS. The svn protocol does not complete normally, nor does the ssh protocol, nor does the TCP protocol! Do we want EVERY SINGLE svn+ssh connection to complete through error paths? That's what's happening now, to EVERYONE who is using svn+ssh from a Windows client! I've already explained why we haven't gotten more complaints.

But I repeat: instantly killing the tunnel upon receipt of the last SVN data from a remote svnserve is NOT the way to shut down the connection between the two. It leaves the remote sshd wondering what happened and sitting there waiting for the next step in ITS protocol (SSH). It has no idea what is coming next (more data or what???). So it sits there waiting...

The recent change to use APR_KILL_ONCE is (imo) the right way to do it, but it works (sending SIGHUP to the tunnel) only on Linux, etc. I suspect the authors didn't realize that APR on Windows converts KILL_ONCE to KILL_ALWAYS and instantly kills the child (unlike the real KILL_ONCE on Linux, which sends SIGHUP to the tunnel, then waits for 3 seconds to allow things to wind down properly, then kills it if it hasn't already exited normally).

> Also, to quote your original post -- "sshd goes into bozo mode and leaves
> its svnserve subprocess running." I haven't looked into the sources, but if
> I recall correctly, sshd under this circumstance is sending "SIGHUP" to the
> child process (svnserve), giving it the opportunity to flush buffers ...
> which would mean that svnserve is not responding correctly to external
> signals.

sshd eventually does that. But only after a long time, for the reasons given above.

  -- Bob

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2410204
Received on 2009-10-22 14:47:24 CEST

This is an archived mail posted to the Subversion Dev mailing list.