RE: Re: Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]

From: Bob Denny <rdenny_at_dc3.com>
Date: Sat, 24 Oct 2009 10:52:55 -0700 (PDT)

Stefan --

I apologize for my OS-related comments. I understand, and will not mention such things in the future.

> Are you sure? OpenSSH supports it. As far as I know people can use
> [connection pooling] on Windows e.g. with Cygwin.

I think that virtually all Windows people are using a client such as one of those I mentioned or svn.exe. I can see some setup with Windows as a tunneled server using Cygwin and OpenSSH, but I think this is a rarity, and my patch doesn't affect the server side anyway.

> But from reading the APR code it seems there is no difference at all
> between APR_KILL_ONLY_ONCE and APR_KILL_ALWAYS on windows.

Correct. I also noted this in earlier messages.

> So the problem you are trying to fix should have existed pre-1.6.5.
> Can you confirm this, if only to help me make sure I've understood
> the problem?

It would be difficult, as I would have to check out and build an earlier version. I will do it if you really think it will help. Building subversion on Windows is really tricky. I spent two days trying with the instructions provided in the subversion tree and failed. I am able to build it, however, with the build tools and scripts provided in the TortoiseSVN tree in just a half hour with no problems (apart from tons of compiler warninhs about signed-unsigned comparisons and conversion from size_t to int and __int64 to int). The results pass all of the tests. But it is not a build from subversion repo, so I don't know if that's good enough.

I have no doubt that the problem exists in earlier versions though, if APR_KILL_ALWAYS is being done.

> OK, I believe that, and that's quite an amount of client coverage
> in your testing which is very good. Any problems at the server's end?

None.

> I'd rather be careful when changing the default to something that
> is known to have caused problems in the past.

Understood! But was it known to cause problems on Windows?

> What I'd like to be 100% certain about is that not killing the
> [tunnel] will not cause any problems on the server side.

It has not in my testing.

In fact, killing the tunnel ALWAYS causes problems on the server side on my setup with a fast client CPU and a modest (public) internet connection to the server. My first message with the ethernet traces and the Tortoise logs shows that the SVN protocol never gets a chance to close (EOF exchange), so the remote svnserve is left wondering what's next, and the SSH protocol never gets a chance to close, leaving the remote sshd wondering what's next. Eventually one of them gives up and exits though its error path, but it takes a long time.

As a reminder, with my slower laptop system, it takes longer to actually kill the tunnel after closing the fd. So the tunnel gets a chance to sneak in the SSH closing exchange. Then the remote sshd exits normally, terminating its child svnserve, which is still waiting for something else because it didn't get the svn-protocol EOF exchange, but at least the remote processes die. Even this scenario is unclean. But I suspect that is what's happening to others who are not having the problem I have. Another factor that may prevent the problem is a fast SSH connection (LAN) where it takes less time to complete the SSH closing exchange.

What we want is for the client svn and server svnserve to be able to complete their work with an EOF exchange at the SVN protocol level. This will allow (cause) the remote svnserve to exit normally. Then have the client close the fd, (an EOF to the SSH tunnel) which will allow (cause) the client and server SSH tunnel pair to gracefully complete the SSH closing exchange, at which time both the local tunnel agent and the remote sshd will exit normally. Finally, the client exits normally. This is how it is designed to work. I spent a day reading code and protocol docs to learn about it.

-- Bob

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2411073
Received on 2009-10-24 19:53:00 CEST

This message: [ Message body ]
Next message: Stefan Sperling: "Re: Re: Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Previous message: Daniel Shahaf: "Re: mingw32 test failure: svn_dirent_is_canonical"
In reply to: Stefan Sperling: "Re: Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Next in thread: Stefan Sperling: "Re: Re: Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"
Reply: Stefan Sperling: "Re: Re: Re: Issue #2580 revisited: Windows unclean TCP close [SEVERE]"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]