[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Using Sqlite in libsvn_wc (introducing sqlite_stat1)

From: Bert Huijben <bert_at_qqmail.nl>
Date: Sun, 17 Nov 2013 14:28:10 +0100

> -----Original Message-----
> From: Branko Čibej [mailto:brane_at_wandisco.com]
> Sent: zondag 17 november 2013 14:08
> To: dev_at_subversion.apache.org
> Subject: Re: Using Sqlite in libsvn_wc towards Subversion 1.9++
>
> On 17.11.2013 13:34, Bert Huijben wrote:
> > Hi,
> >
> > For Subversion 1.8 I spend a lot of time optimizing our use of Sqlite in the
> > working copy to work as optimal as possible with the then Supported range
> of
> > Sqlite versions (3.7.12-3.7.19). Then a few months later in Sqlite 3.8.0 a
> > new query planner was introduced that broke quite a few of the
> optimizations
> > I/we assumed were ok, because we did things different than the Sqlite
> > developers intended.
> >
> > For Subversion 1.8 we did:
> > * Try to support as many sqlite versions as possible (3.7.12-infinity)
> > * Try to optimize queries without describing our layout
> > * Add many columns to indexes to make them unique and avoid going back
> to
> > the original table when possible.
> >
> > This goes against +- all the recommendations of the Sqlite designers as they
> > recommend
> > * Provide your users exactly the sqlite version you tested against -> link
> > statically as part of your source
> > * Provide statistics about your table.
> > * Use 'good indexes' (either with stats; or with a good first key)
> >
> > As users that have upgraded to Sqlite 3.8.0 or later have found out our
> > design choices really affect performance on huge working copies: several
> > performance critical queries during operations such as commit, delete and
> > revert are now handled as a full table scan.
> >
> >
> > To optimize our queries for Subversion 1.7 and 1.8 we should probably
> write
> > a bit of code to fill the sqlite statistics table with the information
> > Sqlite needs to make the right decisions,
>
> I've looked at pre-filling the sqlite_stats1 table. I think it's not
> going to work because apparently some of the stats columns contain index
> row counts, which of course vary not only from one working copy to the
> next, but also during the lifetime of a working copy. Maybe there's a
> way to fake "infinite" size indexes; I haven't found documentation on
> how to do that. Most of the analyze results seem to be concerned with
> the wc_id column.
>
> As a data point: running ANALYZE in a wc.db for a checkout of
> ^/subversion takes about half a minute on my mac (with SSD, fwiw), so I
> don't think re-running it incrementally is an option.

These values don't have to match the actual working copy number. This table should be initialized once by a developer on a 'real database' and should usually not be updated on user databases. The numbers are loaded from the sqlite_stat table when the first query is planned and are used for all further plans.

Sqlite itself will never update this table unless asked.
(I will commit an implementation for this, with a test case that validates what we set on the current schema later today)

This was the recommended approach for existing versions I got from the primary Sqlite architect. (See the sqlite users archive)

Following up on other points in another mail

        Bert
Received on 2013-11-17 14:28:55 CET

This is an archived mail posted to the Subversion Dev mailing list.