On 11.03.2011 20:13, Greg Stein wrote:
> I also don't like to see structures like svn_wc__db_info_t. We had a
> big problem with the entry_t, and things like info_t will continue to
> propagate that broken model. By definition, to use that structure a
> query must be done against both NODES and ACTUAL_NODE.
This comment is somewhat orthogonal to the API discussions, but as I've
noted before ... after my relatively brief sojourn in wc-db, I came to
the conclusion that having separate NODES and ACTUAL_NODE tables is
going to be a perpetual impediment to really speeding up the working
copy. I believe this split is a very premature space-vs-speed
optimization, and it doesn't even save all that much space, relatively
speaking. It wouldn't be so bad if outer joins were reasonably fast in
Sqlite, but my measurements at the time showed that they can be several
orders of magnitude slower than inner joins.
(Merging NODES and ACTUAL_NODE would effectively create a materialized
view of a left-joined query over both tables, without the overhead that
this implies, and of course ignoring the fact that Sqlite doesn't
support materialized views anyway.)
When thinking about the API, I suggest the main things to keep in mind
should be:
* Use the power of SQL. Complex queries and filtering should be done
in SQL, not C code.
* Whenever possible, perform a single large query and store results
in temporary tables for processing, instead of issuing many small
queries and combining the results in code. A single query with
file-backed cooked results will almost always be faster than a
bunch of smaller queries (speedup can range from several times to
several orders of magniture, depending on working copy size),
/and/ preparing the dataset in a single Sqlite transaction will
guarantee that the results returned by the API are a consistent
snapshot of WC state.
-- Brane
-- Brane
Received on 2011-03-12 00:52:24 CET