On Wed, May 27, 2009 at 10:34:00PM +0100, Stefan Sperling wrote:
> Right. So we would be using keep-alives as an out-of-band mechanism,
> in the sense that we ask TCP to please tell us if we get a reset from
> the other side when sending non-svn-protocol messages. The svn protocol
> itself is then isolated from having to deal with this network-layer
> problem.
>
> I see your point and will try to find a way to do this.
Here's a patch.
I've tested it by:
1) Configuring my kernel to start sending keep-alives after 20 seconds
instead of 2 hours (the default).
2) Running a large checkout over svn:// from a remote server.
3) Causing the connection to stall by making my router drop any packets
sent from the server to the client. This simulates the situation were
the client is no longer able to receive traffic from the server because
its connection to the public internet has gone down.
4) Waiting for some time.
5) Inserting another firewall rule which sends a TCP RST packet for any
TCP packet going to the server. This simulates the situation were the
client's public internet connection comes back, but with a different
IP address assigned to it by the ISP, so that the server will reply
with a RST because it does not recognize the client anymore.
6) All the way through, monitoring the traffic using tcpdump, making
sure that there is no traffic generated by the client itself,
because this would cause keep-alives not to be sent. Also, making
sure that the kernel did send keep-alives after 20 seconds. On Linux,
these seem to show up as a succession of a couple of ACKs for the
same segment sent from the client to the server. At least that is
what I saw and it is sufficiently different from the other traffic
to signify that keep-alives were in fact being used.
Without the patch, svn just sat there waiting seemingly forever,
even after the RST firewall rule was inserted.
With the patch, svn threw an error after the quick succession of
ACKS appearing in the tcpdump output:
subversion/svn/checkout-cmd.c:168: (apr_err=104)
subversion/libsvn_client/checkout.c:224: (apr_err=104)
subversion/libsvn_client/update.c:275: (apr_err=104)
subversion/libsvn_ra_svn/client.c:281: (apr_err=104)
subversion/libsvn_ra_svn/editorp.c:877: (apr_err=104)
subversion/libsvn_ra_svn/marshal.c:793: (apr_err=104)
subversion/libsvn_ra_svn/marshal.c:644: (apr_err=104)
subversion/libsvn_ra_svn/marshal.c:644: (apr_err=104)
subversion/libsvn_ra_svn/marshal.c:644: (apr_err=104)
subversion/libsvn_ra_svn/marshal.c:610: (apr_err=104)
subversion/libsvn_ra_svn/marshal.c:297: (apr_err=104)
subversion/libsvn_ra_svn/marshal.c:287: (apr_err=104)
subversion/libsvn_ra_svn/marshal.c:262: (apr_err=104)
subversion/libsvn_ra_svn/streams.c:152: (apr_err=104)
svn: Can't read from connection: Connection reset by peer
The error originates at the call to apr_socket_recv().
Do you agree that this patch is OK and addresses issue #3347 correctly?
Thanks,
Stefan
[[[
Fix issue #3347, "svn never times out when (public) IP changes"
* subversion/libsvn_ra_svn/client.c
(make_connection): After creating the socket, try to enable TCP
keep-alives on it so that we have a chance of detecting connection
problems to the server while we are blocking indefinitely reading
from the socket.
Suggested by: ghudson
]]]
Index: subversion/libsvn_ra_svn/client.c
===================================================================
--- subversion/libsvn_ra_svn/client.c (revision 37867)
+++ subversion/libsvn_ra_svn/client.c (working copy)
@@ -161,6 +161,21 @@ static svn_error_t *make_connection(const char *ho
return svn_error_wrap_apr(status, _("Can't connect to host '%s'"),
hostname);
+ /* Enable TCP keep-alives on the socket so we time out when
+ * the connection breaks due to network-layer problems.
+ * If the peer has dropped the connection due to a network partition
+ * or a crash, or if the peer no longer considers the connection
+ * valid because we are behind a NAT and our public IP has changed,
+ * it will respond to the keep-alive probe with a RST instead of an
+ * acknowledgment segment, which will cause svn to abort the session
+ * even while it is currently blocked waiting for data from the peer.
+ * See issue #3347. */
+ status = apr_socket_opt_set(*sock, APR_SO_KEEPALIVE, 1);
+ if (status)
+ {
+ /* It's not a fatal error if we cannot enable keep-alives. */
+ }
+
return SVN_NO_ERROR;
}
Received on 2009-05-28 23:15:24 CEST