At 04:29 PM 6/14/2007, Thomas Hallgren wrote:
>Hi Karen,
>
>Karen Tracey wrote:
>>Excuse me butting in, but what really is the problem here?
>This is the problem: https://bugs.eclipse.org/bugs/show_bug.cgi?id=192385
Hmmm...there seem to be potentially a few different problems being 
mentioned there.  Your initial report references messages such as 
"Network connection closed unexpectedly".  Oddly enough, this is the 
kind of thing you might see if Alexander Kitaev's technique of 
setting SO_LINGER to 0 before close were being used in a situation 
were it was not safe to do so -- such as if the server side were 
doing it after sending a bunch of data, some of which might still be 
in queues waiting to be sent out (just because a send call has 
returned doesn't mean the data has left the box).  A close then with 
SO_LINGER set to 0 would cause the queued data to be discarded and a 
reset (RST) to be sent to the other side.  Receipt of a RST is 
usually what triggers messages like "connection closed 
unexpectedly".  But without more context and details it's hard for me 
to tell what is going on.  At any rate, though, this problem seems to 
have something to do with established connections unexpectedly going kaput.
Comment #1 then mentions a server that occasionally starts refusing 
new connections.  This doesn't sound like the same problem to 
me.  Clients in this case will usually report "connection refused" or 
"connection timed out".  The server logs mention the server has 
reached its connection limit, and a check via netstat shows SVN with 
many more "established" (not TIME_WAIT) connections than CVS.  But it 
isn't clear whether that check was done on a box when it was refusing 
connections or if it was just a general check on a running server to 
see that SVN uses more connections than CVS.  (There are a total of 
139 connections mentioned -- that strikes me as a  pretty low limit 
for any kind of server.) To diagnose the problem of the server 
reaching its connection limit you'd need to get the information from 
when it was in its "refusing" state and see what state all the 
connections were in.  ESTABLISHED would indicate SVN has not issued 
closes (on either side), whereas TIME_WAIT would indicate both sides 
had closed but TCP was hanging onto the connection until it was sure 
all old data association with the connection was out of the 
network.  If TIME_WAIT connections count against a server connection 
limit, then yuck.
Comment #2 points to a different bug that details lingering TIME_WAIT 
connections.  This might be the same issue as comment #1, if in fact 
the server in comment #1 is hitting its connection limit because of 
TIME_WAIT connections.   I'd argue that's a broken implementation of 
a connection limit, but that's just me.
>If all users opens hundreds (or even thousands) of sockets each time 
>they use our tool, I think there is a high risk that many servers 
>reach a limit where they start refusing connections before the tool 
>has completed its task.
I'm not sure why you think this, but them I'm not too familiar with 
servers that have connection limits.  I'm also a bit confused about 
whether these TIME_WAIT connections are being seen on the client or 
the server side.  From Alexander's note I got the impression it was 
the client side, but here it sounds like they are on the server 
side.  Thing is, you'll usually only see them on one side (the one 
that closes first).  So if you're seeing them on the client side, 
then the server should be transitioning through a much swifter set of 
states to final close, where it would very quickly lose all memory of 
the previous connections, and thus have no reason to refuse 
subsequent connection requests.
>>So >100 connections in TIME_WAIT is not a problem unless you 
>>believe that whatever the application was doing shouldn't have 
>>required >100 individual connections in the first place.  I know 
>>nothing of the code involved here but it seems possible that each 
>>call to getList/getContent/getDirEntry would use its own TCP 
>>connection, thus if you make 100+ calls you might wind up with 
>>about that many connections lingering in TIME_WAIT for a 
>>bit.  Unless you think that many connections should NOT have been 
>>required for the calls made, then I don't see the problem here.
>I compare with CVS where only one connection is established between 
>the client and the server and all commands reuse that single 
>connection. I have no idea how the client/server protocol is set up 
>in SVN but if each command opens a socket of its own, then I might 
>have a big problem. We potentially execute thousands of such 
>commands in a fairly short time frame.
I actually know nothing of the internals of CVS or SVN (I just lurk 
on this list to learn -- I'm interested in SVN).  In a previous life 
I did know a lot about TCP/IP though, which is what prompted me to 
jump in here.  It does sound like SVN uses more connections than 
CVS.  Whether that's a problem, I don't know.  My gut feeling is it 
shouldn't be (though re-using a single connection would probably be 
more efficient).  I'd think any server worth its salt could handle 
thousands of connections in a short period of time -- that's what 
servers do, after all.  But if there are "servers" out there counting 
TIME_WAIT connections against some "connection limit", then, well, 
there might be a problem with the way SVN apparently manages connections.
Karen 
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subclipse.tigris.org
For additional commands, e-mail: dev-help@subclipse.tigris.org
Received on Fri Jun 15 05:55:41 2007