[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Using Sqlite in libsvn_wc towards Subversion 1.9++

From: Bert Huijben <bert_at_qqmail.nl>
Date: Sun, 17 Nov 2013 13:34:40 +0100

        Hi,

For Subversion 1.8 I spend a lot of time optimizing our use of Sqlite in the
working copy to work as optimal as possible with the then Supported range of
Sqlite versions (3.7.12-3.7.19). Then a few months later in Sqlite 3.8.0 a
new query planner was introduced that broke quite a few of the optimizations
I/we assumed were ok, because we did things different than the Sqlite
developers intended.

For Subversion 1.8 we did:
* Try to support as many sqlite versions as possible (3.7.12-infinity)
* Try to optimize queries without describing our layout
* Add many columns to indexes to make them unique and avoid going back to
the original table when possible.

This goes against +- all the recommendations of the Sqlite designers as they
recommend
* Provide your users exactly the sqlite version you tested against -> link
statically as part of your source
* Provide statistics about your table.
* Use 'good indexes' (either with stats; or with a good first key)

As users that have upgraded to Sqlite 3.8.0 or later have found out our
design choices really affect performance on huge working copies: several
performance critical queries during operations such as commit, delete and
revert are now handled as a full table scan.

To optimize our queries for Subversion 1.7 and 1.8 we should probably write
a bit of code to fill the sqlite statistics table with the information
Sqlite needs to make the right decisions, but for Subversion 1.9 and later
we should decide what we want to do to make sure our users will never see
such a regression any more.

I think we should at least:
* Update our indexes to be closer to what the Sqlite developers call 'good'
indexes': remove unnecessary columns.
(In the sqlite implementation they are already unique via the hidden rowid)

*****
But I would also like to recommend/ask that we start bundling Sqlite with
Subversion, to allow optimizing for the specific version we use for a
release without risking future breakage.
*****

I don't expect that our *nix packagers that like to package every component
separately are going to like this, but I think we should do this to
* guarantee our users a reasonably performance.
* to allow us to diagnose user performance problems
Perhaps a few will even patch their own sqlite version in, but then we can
safely say that we didn't cause the performance problems users have in these
installations.

If we do that and/or bump our requirement to the soon to be released Sqlite
3.8.2 (or later) we can use quite a few new indexing tricks that should
improve update performance on the NODES table and the always attached
triggers more than a bit.*)

If we can get consensus around this I would like to add the Sqlite 3.8.2
amalgamation directly to our source tree about a week after its release
(Probably December 2013). We can than start optimizing our Sqlite schema and
query usage towards this version.

        Bert

*)
* The PRISTINE table is the perfect candidate for a 'WITHOUT rowid' table,
which should give a 30% performance boost on update/select. (Requires Sqlite
3.8.2)
* The moved_to index on nodes is a typical example of where a partial index
would help reducing unnecessary index updates (Requires 3.8.0).
Received on 2013-11-17 13:35:36 CET

This is an archived mail posted to the Subversion Dev mailing list.