Re: SQLite and callbacks

From: Stefan Sperling <stsp_at_elego.de>
Date: Wed, 12 Jan 2011 13:02:21 +0100

On Tue, Nov 30, 2010 at 12:33:38PM +0000, Philip Martin wrote:
> "Hyrum K. Wright" <hyrum_wright_at_mail.utexas.edu> writes:
>
> > Stefan's patch to make a recursive proplist much more performant
> > highlights the great benefit that our sqlite-backed storage can have.
> > However, he reverted it due to concerns about the potential for
> > database contention. The theory was that the callback might try and
> > call additional wc functions to get more information, and such nested
> > statements weren't healthy for sqlite. We talked about it for a bit
> > in IRC this morning, and the picture raised by this issue was quite
> > dire.
>
> It's not the nested statements that are the problem, it's extending the
> duration of the select. The callback is to the application so it could
> be updating a GUI, interacting with the user, etc. and so could take a
> much greater length of time than other selects. An SQLite read blocks
> SQlite writes by other threads/processes, so the long lived select may
> cause writes to fail that would otherwise succeed. (We have a busy
> timeout of 10 seconds so that's the sort of time that becomes a
> problem.)
>
> It's related to the problem of querying the working copy while it is
> locked. In 1.6 it was possible to run status, info, propget, etc. while
> an update was in progress. In 1.7 that's no longer possible, all the
> read operations fail. If we decide that this change in behaviour is
> acceptable then the long-lived select isn't really a problem, it simply
> stops the operations like update from starting. If we want the 1.6
> behaviour then the long-lived select becomes more of a problem, as it
> makes update much more likely to fail part way through.
>
> The callback is not necessarily wrong, it depends what sort of behaviour
> we are trying to provide.

I think we should review our requirements for callbacks again.

<recap>

In 1.7 we have two layers of locking.
One is the sqlite locking, which guarantees meta data consistency in wc.db.
The other is working copy write locks (svn_wc__db_wclock_obtain())
which are obtained at the start of long-running operations which
modify the working copy (e.g. commit, update, merge, patch, ...).

The problem we're facing is that we want to allow read-only operations
(e.g. status, proplist, ...) to run concurrently with N other read-only
operations and at most one write operation.

For instance, the user might have TortoiseSVN and some IDE SVN plugin
running while working with the CLI client on the same working copy.
The CLI client should be able to perform write operations while the
other clients periodically poll the WC status to update their icons.

The WC write lock will not pose a problem here since it is only taken by
writers of which there must at most be one at a time (for a given subtree
of the WC). Readers won't attempt to acquire this lock.

The sqlite locks on the other hand creates a problem during wc.db queries.
A writer blocks all readers, and any active reader will block the writer
(see http://sqlite.org/atomiccommit.html).

Our solution to this problem is to keep the queries short, so that
sqlite locking will not cause long-running read or write operations
to stall or fail.

</recap>

The problem with callbacks is that they can in theory take a long
amount of time to complete. So when running callbacks while an sqlite
query is running, the sqlite query could take a long time to complete.

We can solve this problem by never calling back to the client
while an sqlite transaction is in progress. However this has an impact
on performance. Even if retrieving the desired information is possible
with a single query, we have to run multiple queries to retrieve the
information in chunks (e.g. per directory) so we can pass it to the
callback.

The other option is to place restrictions on what a client callback
is allowed to do, depending on which "locking context" the callback runs in.
A locking context implies restrictions on what the callback is allowed to do.

E.g. we could say that the proplist callback runs within a
"wc.db locking context". This would be a highly restrictive context,
because an sqlite transaction is in progress.
The only action allowed is to consume the data passed in (e.g. print it or
store it to a temporary file or a buffer for later processing).
The callback should complete its task as quickly as possible to avoid
disturbing other SVN clients.
No calls to any function in libsvn_client or libsvn_wc would be allowed
from this context. If the callback doesn't comply with this restriction,
there is a risk of undefined behaviour of the entire system. This means
that a badly behaving client can disturb user experience.
But it also allows us to be maximally efficient.
If backwards compatibility is a concern, we could allow callbacks to
be called in this context only for API calls that exist in 1.7 and
newer. Backwards-compat code would use a different implementation with
a less restrictive locking context for these callbacks. This would allow
existing callbacks written against 1.6.x and earlier APIs to continue to
work unchanged.

Then we also have callbacks that run in "wc write locking context".
E.g. the callback that gets a commit log message from the caller might
run within this context if the commit operation involves a working copy.
These callbacks can wait for user input, and must assume that they run
with a WC write lock held, but not with an sqlite lock.
Calling any function that will start a new write operation on the working copy
is prohibited (because it will fail when it attempts to acquire another
WC lock), but read-only operations on the working copy are allowed.

There is also the "lock-free context".
For instance, the blame line receiver (svn_client_blame_receiver3_t)
would run in this context. This context doesn't impose any restrictions.

I've mentioned the latter two contexts for completeness.
The first thing to do would be to document which callbacks are run within
the "wc.db locking context". Once that is done we could also introduce the
other contexts (for completeness) and assign existing callbacks to them.

Does this approach sound sane?
If so, I would like to revive the "fast" proplist implementation which
was reverted in r1040135, and apply a similar approach to other subcommands
where possible. I would also try to provide separate implementations for
the backwards-compat code if required.

Stefan
Received on 2011-01-12 13:03:06 CET

This message: [ Message body ]
Next message: Ramkumar Ramachandra: "Re: [l10n] Translation status report for trunk r1055292"
Previous message: Philip Martin: "Re: svn commit: r1057088 - in /subversion/trunk/subversion: libsvn_fs_base/bdb/env.c libsvn_ra_svn/cyrus_auth.c libsvn_subr/io.c libsvn_subr/sqlite.c"
Next in thread: C. Michael Pilato: "Re: SQLite and callbacks"
Reply: C. Michael Pilato: "Re: SQLite and callbacks"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]