RE: Issue #2580 revisited: Windows unclean TCP close [SEVERE]

From: Paul Charlton <techguru_at_byiq.com>
Date: Thu, 22 Oct 2009 07:25:06 -0700

Bob,
I see your point entirely. Reverting to legacy behavior is the path of
least resistance and is usually the most compatible with what is already in
the field.

That being said, it should only take a few minutes for someone to try
running SSHD + SVNSERVE with SSHD in debug mode and see what the log
messages say about the handshake with SVNSERVE under the circumstance you
describe with the RST packet, especially since there are other
router/network scenarios which would leave a zombie SVNSERVE running. At a
minimum, get a bug filed on the server side zombie behavior -- it will most
likely be something out-of-band like an ignored signal or skipped return
code from "read" on the stdin from the parent SSHD process.

Best regrds,
Paul

> -----Original Message-----
> From: Bob Denny [mailto:rdenny_at_dc3.com]
> Sent: Thursday, October 22, 2009 5:47 AM
> To: dev_at_subversion.tigris.org
> Subject: RE: Issue #2580 revisited: Windows unclean TCP close [SEVERE]
>
> Paul --
>
> > I did read this entire thread ... the problem is the linux sshd/child
> > process, per http://tools.ietf.org/html/rfc1122#page-87, see section
> > 4.2.2.13.
> > Just for kicks, also see
> >
> http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_t
> ermin
> > ation
>
> I'm well aware of these issues, particularly the TCP connection
> termination process. I may not have been clear enough before: by
> instantly killing the tunnel process, the proper termination of the TCP
> connection is prevented (not to mention the proper termination of the
> SVN protocol!). The Ethernet traces I posted show the node where the
> tunnel USED to be running sending a TCP RESET packet. The TCP
> connection was IMPROPERLY terminated due to the tunnel (the client end
> of the TCP connection) vaporizing before the process could run to clean
> completion.
>
> > Under this situation, I would not advocate changing the windows
> > implementation to "correct" the mis-behavior of the linux sshd and
> its child
> > process.
>
> Do I understand you to be saying that it's sshd's error? How can it
> tell the difference between "dead partner" and "temporarily slow or
> dead net connection"? All that TCP stuff is running a couple of layers
> down under the sockets API that sshd uses. So sshd properly sits there
> for "a while", with its child svnserve also sitting there, both waiting
> for some word from the other end. Eventually (10+ MINUTES later) sshd
> DOES exit and terminate its child. But this process is one of "forget
> it, something's really wrong", I'm outa here...
>
> Instantly killing the tunnel (clearly the WRONG way to terminate a TCP
> connection!) sets off a chain of events, ALL OF WHICH ARE ERRORS. The
> svn protocol does not complete normally, nor does the ssh protocol, nor
> does the TCP protocol! Do we want EVERY SINGLE svn+ssh connection to
> complete through error paths? That's what's happening now, to EVERYONE
> who is using svn+ssh from a Windows client! I've already explained why
> we haven't gotten more complaints.
>
> But I repeat: instantly killing the tunnel upon receipt of the last SVN
> data from a remote svnserve is NOT the way to shut down the connection
> between the two. It leaves the remote sshd wondering what happened and
> sitting there waiting for the next step in ITS protocol (SSH). It has
> no idea what is coming next (more data or what???). So it sits there
> waiting...
>
> The recent change to use APR_KILL_ONCE is (imo) the right way to do it,
> but it works (sending SIGHUP to the tunnel) only on Linux, etc. I
> suspect the authors didn't realize that APR on Windows converts
> KILL_ONCE to KILL_ALWAYS and instantly kills the child (unlike the real
> KILL_ONCE on Linux, which sends SIGHUP to the tunnel, then waits for 3
> seconds to allow things to wind down properly, then kills it if it
> hasn't already exited normally).
>
> > Also, to quote your original post -- "sshd goes into bozo mode and
> leaves
> > its svnserve subprocess running." I haven't looked into the sources,
> but if
> > I recall correctly, sshd under this circumstance is sending "SIGHUP"
> to the
> > child process (svnserve), giving it the opportunity to flush buffers
> ...
> > which would mean that svnserve is not responding correctly to
> external
> > signals.
>
> sshd eventually does that. But only after a long time, for the reasons
> given above.
>
> -- Bob
>
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageI
> d=2410204

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2410241
Received on 2009-10-22 16:25:49 CEST

This message: [ Message body ]
Next message: Paul Charlton: "RE: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Previous message: Paul Charlton: "RE: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
In reply to: Bob Denny: "RE: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Next in thread: Bob Denny: "RE: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Reply: Bob Denny: "RE: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]