Re: More robust handling of shaky network connections?

From: C. Michael Pilato <cmpilato_at_collab.net>
Date: 2004-10-27 15:25:45 CEST

Auke Jilderda <auke.jilderda@philips.com> writes:

> In any case, it raises an interesting issue: Apparently, Subversion
> does not try to re-connect a busted network connection and I'm
> wondering why that is? Being a networked application that often
> needs an open connection for a prolonged amount of time (e.g. we've
> seen commits that take over two hours), it might be wise to provide
> some sort of graceful degradation (e.g. trying once or twice to
> restore a broken connection before bailing out). Any arguments in
> favour or against this?

This is (obviously) a good idea. It's the implementation that I fear.
Subversion's modularity keeps the network stuff (that moves the data
which transforms a working copy from state to state) well away from
the working copy management code (which actually understands those
states). If a long-lived request, like a REPORT used during a
checkout or update operation, was to die in the middle somewhere, the
repository access layer would be completely oblivious to the details
of the half-finished operation.

Because our most widely used data transfer API is the "editor", which
demands depth-first tree ordering with no revisitation, the RA layer
would need to somehow signal the WC layer about the network problem so
that either the WC could rollback to the same state it had before the
initial request, or at least be placed into a mode where it expected
to see much of the same data changes that it already saw (and know
that this is okay).

Allow me to wonder aloud so that my ignorance is easier to see.

Could this be accomplished strictly at the RA layer level? What if
the RA modules kept track of exactly where they were in processing a
request when the connection dropped, and then, on repetition, ignored
everything up to that point. I'm thinking about the likes of 'wget
-c' (continue where I left off). So, for example, if libsvn_ra_dav
know it had read 12,015 bytes off the stream successfully before
something died, it would repeat, ignore 12,015 bytes, and then
continue processing at the 12,016th byte. The working copy code (and
perhaps even the user) would be oblivious to a problem having
occured. Something tells me it just ain't that simple.

Could this be accomplished strictly at the client layer level? We've
done a lot of work to make operations like checkouts and updates
restartable. There are still bugs in these areas (switches, notably),
and some stuff that basically works but looks scary (merges showing
'G' for everything previously merged), but if we could get our
subcommands to a place where the larger operation could be safely
re-attempted, and where the RA layers return clear indications (in the
forms of predictable, dedicated error codes) of when a failure has
occured for a network integrity reason, then perhaps this kind of
re-attempt processing could happen even well up into the client
libraries.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Oct 27 15:52:36 2004

This message: [ Message body ]
Next message: Colin JN Breame: "Re: undefined reference when compiling on cygwin"
Previous message: Sigfred Håversen: "Re: New site for the alternative mail archives"
In reply to: Auke Jilderda: "More robust handling of shaky network connections?"
Next in thread: Auke Jilderda: "Re: More robust handling of shaky network connections?"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]