WANdisco's MultiSite product is essentially a multiplexing http proxy
that sits between the client and Apache. Testing has revealed an issue
that probably affects other proxies, such as squid.
The problem occurs during commit. When the client has a direct
connection to Apache it holds a connection open for the duration of the
commit and sends multiple requests over the same connection. If Apache
dies, or is forcibly restarted, the client sees the connection break and
aborts the commit.
When a proxy is present the client's connection is to the proxy and the
proxy maintains its own connection to Apache. If the proxy loses a
connection to Apache it may just switch to another connection. If this
happens during a request then the client will still see an error, but if
it happens between requests the client may not be aware that the proxy
has switched connections. This is indeed how the squid proxy behaves.
So if the timing is just right it's possible for one Apache process to
start writing the transaction, for that process to stop, and for another
process to take over the commit. WANdisco observed problems on FSFS
where the transaction is synced at the end of the commit, not for each
http request. What ends up in the transaction probably depends on the
details of the kernel memory and disk caching, the system load, the
underlying OS filesystem, etc.
In my testing with squid I have not managed to produce a corrupt commit,
but I suspect that under the right conditions it would happen. I think
that getting mod_dav_svn to sync before acknowledging each http request
is a non-starter, for performance reasons. Can mod_dav_svn detect that
the connection has changed? It's too late to get the old process to
sync, but perhaps we could abort the commit? Some valid commits would
fail, but it would avoid the small risk of a corrupt commit.
--
Philip
Received on 2011-02-16 17:45:41 CET