[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: wc_db performance (was: wc_db API discussion)

From: Branko Čibej <brane_at_e-reka.si>
Date: Sat, 12 Mar 2011 01:59:45 +0100

On 12.03.2011 01:29, Greg Stein wrote:
> 2011/3/11 Branko Čibej <brane_at_e-reka.si>:
>> This comment is somewhat orthogonal to the API discussions, but as I've
>> noted before ... after my relatively brief sojourn in wc-db, I came to
>> the conclusion that having separate NODES and ACTUAL_NODE tables is
>> going to be a perpetual impediment to really speeding up the working
>> copy. I believe this split is a very premature space-vs-speed
>> optimization,
> Not at all. The original design has BASE/WORKING/ACTUAL, as defined by
> Erik's work. Only later did we come to realize that BASE and WORKING
> could be looked at through a different lens and be combined (via the
> op_depth technique).
>
> So. Not a premature optimization, but a design choice.

Six of one, half a dozen of the other. ACTUAL is just another op-depth
with a few extra attributes, so let's compromise and call it an
implementation choice, rather than a design choice. :)

> Do we know more
> now? Absolutely. Should they be combined? I would suggest looking into
> that in 1.8 unless we just can't get performance where we'd like it
> (and we don't know what that is!), and if it can be shown to be the
> cause of ACTUAL_NODE. I just don't know that we want to try another
> combination of tables at this point in time... and that we want to get
> this baby shipped, if it makes sense.
>
>> ...
>> * Use the power of SQL. Complex queries and filtering should be done
>> in SQL, not C code.
> I'm a little leery of creating a wc_db API that has N specialized APIs
> each doing one thing for one caller. The more APIs we have, the more
> restricted our implementation becomes. We already have a pretty tight
> binding between the API and the underlying storage model. Thankfully,
> it is internal to WC. But the more specialization and view into the
> underlying storage that the API provides, the less freedom we have to
> fix things.

Right now I'm seeing C code do:

    * recursive tree walks of the wc-db with per-directory queries
    * programmatic "joins" of data from different tables
    * op-depth filtering on NODES

Much as I try, I can find no good reason, even taking account of the
considerations you listed above, to /not/ do these in SQL. There aren't
all that many radically different queries you can do in wc-db anyway.

 You can do further dataset filtering by parametrizing the SQL queries,
but then you're basically just making the size of the intermediate
result temp tables smaller (but also making the combined query+filter a
bit faster because you take advantage of Sqlite's awarness of data
locality).

-- Brane
Received on 2011-03-12 02:00:21 CET

This is an archived mail posted to the Subversion Dev mailing list.