On Mon, Nov 29, 2010 at 07:43:40PM -0600, Hyrum K. Wright wrote:
> We use callbacks extensively throughout our code as a means of
> providing streamy feedback to callers. It's a pretty good paradigm,
> and one that has served us well. We don't put many restrictions on
> what the callbacks can do in terms of fetching more information or
> calling other functions.
> Enter wc-ng.
> Stefan's patch to make a recursive proplist much more performant
> highlights the great benefit that our sqlite-backed storage can have.
> However, he reverted it due to concerns about the potential for
> database contention. The theory was that the callback might try and
> call additional wc functions to get more information, and such nested
> statements weren't healthy for sqlite. We talked about it for a bit
> in IRC this morning, and the picture raised by this issue was quite
> In an attempt to find out what the consequences of these nested
> queries are, I wrote a test program to attempt to demonstrate the
> failure, only now I can't seem to do so. Attached is the test
> program, but when I run it, I'm able to successfully execute multiple
> prepared statements on the same set of rows simultaneously, which was
> the concern we had about our callback mechanism in sqlite.
> So is this a valid problem? If so, could somebody use the attached
> test program to illustrate it for those of us who may not fully
> understand the situation?
Some more thoughts on this:
We could forbid callbacks from calling any other libsvn_client function,
and provide all information they will ever need as arguments. Careful
consideration of possible use cases for an API would allow us to do this.
E.g. for proplist, it's clear that the callback will want a path and a
list of properties. It wouldn't be allowed to retrieve additional information.
This approach keeps memory usage during a crawl as low as possible and
could get away with a single query per operation.
However, this approach might break existing third party code using callbacks
which call into libsvn_client. It won't be trivial to support 1.6.x semantics
("you're free to call whatever you want") on top of a 1.7.x API placing
restrictions on the callbacks.
Another possibility is partitioning the problem by running queries per
directory. The walk_children crawler would crawl per directory instead
of per path. All queries work in units of directories and no callbacks
are invoked while an sqlite transaction is active. Callbacks are free to
do whatever they want, but the crawler will need to deal with resulting
edge cases such as the current directory disappearing from the DB or
from disk (our current node walker seems to account for missing paths
only before the first invocation of the callback -- it doesn't reckon
with the callback deleting paths or removing relevant data from the DB.)
This approach will cause memory usage to be a function of the number
of directory entries. It will also be slower than the first approach
because one or more queries are issued per directory.
Given our backwards-compatibility policy I'd say we'd have to go with
the second option. Can anyone think of any other options?
Received on 2010-11-30 15:40:46 CET