Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]

From: Bob Denny <rdenny_at_dc3.com>
Date: Fri, 16 Oct 2009 15:35:25 -0700

> Bob Denny:
> As of SVN 1.6.5, the svn client forcibly kills
> the tunnel subprocess as soon as it receives the last of the data that
> it expects.
>
> Stefan Sperling:
> This is not entirely correct. We have been killing the ssh client
> for a long time.

I'm sorry, what I meant is that I observed it in 1.6.5. I have never used older
versions.

> What's new in 1.6.5 is that we now send SIGTERM instead of SIGKILL,
> which allows an SSH process to clean up a master socket it might be
> managing for for SSH connection pooling.

This is true only for Linux. On Windows, inside APR, the APR_KILL_ONLY_ONCE is
turned into APR_KILL_ALWAYS, and that becomes an instant TerminateProcess()
call, which is like a SIGKILL but even harsher as it is not a signal, but the OS
instantly stops process execution.

> But it's *not* a new problem in 1.6.5.

Yes, I know.

> But [your solution] creates another problem which killing the subprocess is
> meant to cure. If long-lived Subversion clients (e.g. IDE integrations)
> using the SVN libraries open a connection via svn+ssh:// it's possible
> that the ssh subprocesses created never die. Over time, ssh processes
> accumulate on the client workstation, doing nothing but waiting to be
> killed.

Well, that's the flip side of this problem, where usage of the command line svn
does the same thing - "the ssh subprocesses created never die...".

> See the comment above the line you're quoting:
>
> /* Arrange for the tunnel agent to get a SIGTERM on pool
> * cleanup. This is a little extreme, but the alternatives
> * weren't working out.

Heh, and then there is the comment in apr function apr_proc_kill()
(threadproc/win32/signals.c):

  /* Windows only really support killing process, but that will do for now.
   *
   * ### Actually, closing the input handle to the proc should also do fine
   * for most console apps. This definately needs improvement...
   */

Yes, it does need improvement :-) He is correct in his thoughts about
subprocesses on windows. Unless the subprocess is poorly written, closing the
input handle will make the subprocess ultimately go away. But we can't protect
the world against every conceivable programming error! At least the tunnels I
have used (ssh, PuTTY PLink, and and TortoisePLink) all do it correctly, an EOF
on the stdin handle results (eventually) in cleanup_exit() being called, and the
protocol between it and the remote sshd runs to proper completion. And anyway,
if a tunnel does NOT go away when its stdin is closed, it's a bug in the tunnel!

> Patching APR is certainly acceptable.
> Would you be willing to work on a patch of APR and submit it there?

The problem is that I would be patching part of the Apache Portable Runtime,
something I have zero real-world experience with. There's no SIGTERM (as
observed above) so my patch would be to just wait three seconds then kill. This
would probably be met with ridicule by the webserver people who are used to
their server spawning hundreds of subprocesses per millisecond :-) I believe
that I would be frozen out of the Apache group for a patch especially since my
patch would be Windows oriented and that would tag me as "one of those Windoze
guys" :-) That's why I observed that it probably isn't practical.

>> 1. Modify libsvn_ra_svn\client.c to use APR_KILL_NEVER *on Windows*
>> instead of APR_KILL_ONCE.
>>
>> What do you think?

> KILL_NEVER is not a good solution. If it was, it would already be done
> this way, and we'd never have had to mess around with killing SSH
> processes in the first place.

I don't think this would present a problem on windows, and that is why I
suggested ONLY doing this on Windows. I'm aware of the issues on Linux,
though, and why you decided to do it to tunnel subprocesses.

So it seems we're stuck: On one hand, we kill tunnels instantly, leaving
sshd/svnserve processes running and accumulating on the remote end. This
certainly happens. On the other hand, we don't kill tunnels and they don't exit
cleanly and accumulate on the local system. It's less clear to me whether this
is a rare or a frequent problem. I wonder how often this happens any more? Is
this an ancient problem for which the solution has been in the code for a long
time? Or is this a problem on Linux and not Windows? Maybe it doesn't exist on
modern versions of Linux? Apart from patching apr, my feeling is that KILL_NEVER
*ON WINDOWS ONLY* is the lesser of two evils (by a lot). KILL_ONCE (SIGTERM, 3
sec., SIGKILL) is the best for Linux, etc.,and that's what you did in 1.6.5.
Good one.

-- Bob

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2408379
Received on 2009-10-17 02:57:32 CEST

This message: [ Message body ]
Next message: Greg Stein: "Re: svn commit: r40025 - trunk/subversion/libsvn_wc"
Previous message: Bert Huijben: "RE: svn commit: r40025 - trunk/subversion/libsvn_wc"
Maybe in reply to: Bob Denny: "Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Next in thread: Branko Cibej: "Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Reply: Branko Cibej: "Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Reply: Peter Samuelson: "Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]