[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [Subclipse-dev] Excessive use of TCP sockets

From: Karen Tracey <kmtracey_at_gmail.com>
Date: 2007-06-15 05:55:21 CEST

At 04:29 PM 6/14/2007, Thomas Hallgren wrote:
>Hi Karen,
>
>Karen Tracey wrote:
>>Excuse me butting in, but what really is the problem here?
>This is the problem: https://bugs.eclipse.org/bugs/show_bug.cgi?id=192385

Hmmm...there seem to be potentially a few different problems being
mentioned there. Your initial report references messages such as
"Network connection closed unexpectedly". Oddly enough, this is the
kind of thing you might see if Alexander Kitaev's technique of
setting SO_LINGER to 0 before close were being used in a situation
were it was not safe to do so -- such as if the server side were
doing it after sending a bunch of data, some of which might still be
in queues waiting to be sent out (just because a send call has
returned doesn't mean the data has left the box). A close then with
SO_LINGER set to 0 would cause the queued data to be discarded and a
reset (RST) to be sent to the other side. Receipt of a RST is
usually what triggers messages like "connection closed
unexpectedly". But without more context and details it's hard for me
to tell what is going on. At any rate, though, this problem seems to
have something to do with established connections unexpectedly going kaput.

Comment #1 then mentions a server that occasionally starts refusing
new connections. This doesn't sound like the same problem to
me. Clients in this case will usually report "connection refused" or
"connection timed out". The server logs mention the server has
reached its connection limit, and a check via netstat shows SVN with
many more "established" (not TIME_WAIT) connections than CVS. But it
isn't clear whether that check was done on a box when it was refusing
connections or if it was just a general check on a running server to
see that SVN uses more connections than CVS. (There are a total of
139 connections mentioned -- that strikes me as a pretty low limit
for any kind of server.) To diagnose the problem of the server
reaching its connection limit you'd need to get the information from
when it was in its "refusing" state and see what state all the
connections were in. ESTABLISHED would indicate SVN has not issued
closes (on either side), whereas TIME_WAIT would indicate both sides
had closed but TCP was hanging onto the connection until it was sure
all old data association with the connection was out of the
network. If TIME_WAIT connections count against a server connection
limit, then yuck.

Comment #2 points to a different bug that details lingering TIME_WAIT
connections. This might be the same issue as comment #1, if in fact
the server in comment #1 is hitting its connection limit because of
TIME_WAIT connections. I'd argue that's a broken implementation of
a connection limit, but that's just me.

>If all users opens hundreds (or even thousands) of sockets each time
>they use our tool, I think there is a high risk that many servers
>reach a limit where they start refusing connections before the tool
>has completed its task.

I'm not sure why you think this, but them I'm not too familiar with
servers that have connection limits. I'm also a bit confused about
whether these TIME_WAIT connections are being seen on the client or
the server side. From Alexander's note I got the impression it was
the client side, but here it sounds like they are on the server
side. Thing is, you'll usually only see them on one side (the one
that closes first). So if you're seeing them on the client side,
then the server should be transitioning through a much swifter set of
states to final close, where it would very quickly lose all memory of
the previous connections, and thus have no reason to refuse
subsequent connection requests.

>>So >100 connections in TIME_WAIT is not a problem unless you
>>believe that whatever the application was doing shouldn't have
>>required >100 individual connections in the first place. I know
>>nothing of the code involved here but it seems possible that each
>>call to getList/getContent/getDirEntry would use its own TCP
>>connection, thus if you make 100+ calls you might wind up with
>>about that many connections lingering in TIME_WAIT for a
>>bit. Unless you think that many connections should NOT have been
>>required for the calls made, then I don't see the problem here.
>I compare with CVS where only one connection is established between
>the client and the server and all commands reuse that single
>connection. I have no idea how the client/server protocol is set up
>in SVN but if each command opens a socket of its own, then I might
>have a big problem. We potentially execute thousands of such
>commands in a fairly short time frame.

I actually know nothing of the internals of CVS or SVN (I just lurk
on this list to learn -- I'm interested in SVN). In a previous life
I did know a lot about TCP/IP though, which is what prompted me to
jump in here. It does sound like SVN uses more connections than
CVS. Whether that's a problem, I don't know. My gut feeling is it
shouldn't be (though re-using a single connection would probably be
more efficient). I'd think any server worth its salt could handle
thousands of connections in a short period of time -- that's what
servers do, after all. But if there are "servers" out there counting
TIME_WAIT connections against some "connection limit", then, well,
there might be a problem with the way SVN apparently manages connections.

Karen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subclipse.tigris.org
For additional commands, e-mail: dev-help@subclipse.tigris.org
Received on Fri Jun 15 05:55:41 2007

This is an archived mail posted to the Subclipse Dev mailing list.