RE: Using Sqlite in libsvn_wc towards Subversion 1.9++

From: Bert Huijben <bert_at_qqmail.nl>
Date: Sun, 17 Nov 2013 14:33:13 +0100

> -----Original Message-----
> From: Branko ÄŒibej [mailto:brane_at_wandisco.com]
> Sent: zondag 17 november 2013 14:08
> To: dev_at_subversion.apache.org
> Subject: Re: Using Sqlite in libsvn_wc towards Subversion 1.9++
>
> On 17.11.2013 13:34, Bert Huijben wrote:
> > Hi,
> >
> > For Subversion 1.8 I spend a lot of time optimizing our use of Sqlite in the
> > working copy to work as optimal as possible with the then Supported range
> of
> > Sqlite versions (3.7.12-3.7.19). Then a few months later in Sqlite 3.8.0 a
> > new query planner was introduced that broke quite a few of the
> optimizations
> > I/we assumed were ok, because we did things different than the Sqlite
> > developers intended.
> >
> > For Subversion 1.8 we did:
> > * Try to support as many sqlite versions as possible (3.7.12-infinity)
> > * Try to optimize queries without describing our layout
> > * Add many columns to indexes to make them unique and avoid going back
> to
> > the original table when possible.
> >
> > This goes against +- all the recommendations of the Sqlite designers as they
> > recommend
> > * Provide your users exactly the sqlite version you tested against -> link
> > statically as part of your source
> > * Provide statistics about your table.
> > * Use 'good indexes' (either with stats; or with a good first key)
> >
> > As users that have upgraded to Sqlite 3.8.0 or later have found out our
> > design choices really affect performance on huge working copies: several
> > performance critical queries during operations such as commit, delete and
> > revert are now handled as a full table scan.
> >
> >
> > To optimize our queries for Subversion 1.7 and 1.8 we should probably
> write
> > a bit of code to fill the sqlite statistics table with the information
> > Sqlite needs to make the right decisions,
>
> I've looked at pre-filling the sqlite_stats1 table. I think it's not
> going to work because apparently some of the stats columns contain index
> row counts, which of course vary not only from one working copy to the
> next, but also during the lifetime of a working copy. Maybe there's a
> way to fake "infinite" size indexes; I haven't found documentation on
> how to do that. Most of the analyze results seem to be concerned with
> the wc_id column.
>
> As a data point: running ANALYZE in a wc.db for a checkout of
> ^/subversion takes about half a minute on my mac (with SSD, fwiw), so I
> don't think re-running it incrementally is an option.
>
> > but for Subversion 1.9 and later
> > we should decide what we want to do to make sure our users will never
> see
> > such a regression any more.
> >
> > I think we should at least:
> > * Update our indexes to be closer to what the Sqlite developers call 'good'
> > indexes': remove unnecessary columns.
> > (In the sqlite implementation they are already unique via the hidden rowid)
> >
> > *****
> > But I would also like to recommend/ask that we start bundling Sqlite with
> > Subversion, to allow optimizing for the specific version we use for a
> > release without risking future breakage.
> > *****
>
> You realize that if we do that, we also have to provide a sqlite
> command-line tool based on that same version.

Why?

Users should have no need to look into our wc.db file, and if they want they can still use any compatible sqlite version. We don't promise performance on their code that peeks in our database.

We don't deliver sqlite3.exe on Windows, do we?

> There's another option: that we use different indexes based on the
> version of SQLite that the wc.db was created with. We can always detect
> when the sqlite binaries are downgraded (or upgraded), and could emit an
> error or warning; and we could provide 'svn cleanup --reindex', or some
> equivalent, that would rebuild indexes according to the current SQLite
> version, without having to throw away the working copy and start over.
>
> > I don't expect that our *nix packagers that like to package every
> component
> > separately are going to like this, but I think we should do this to
> > * guarantee our users a reasonably performance.
> > * to allow us to diagnose user performance problems
> > Perhaps a few will even patch their own sqlite version in, but then we can
> > safely say that we didn't cause the performance problems users have in
> these
> > installations.
>
> Whatever we do, we should always allow linking to external SQLite. The
> most we can do, IMO, is to not search for the headers and libs unless
> --with-sqlite was given explicitly during configure time.

Same question: why?

Netscape, chrome, etc. all link their own sqlite inside. That it is in there is an application detail.

They could have used their own database implementation. The fact that they use an existing database doesn't make it a supported API.

We don't support users that modify our wc.db file, but that some developers can peek inside is nice.

>
> > If we do that and/or bump our requirement to the soon to be released
> Sqlite
> > 3.8.2 (or later) we can use quite a few new indexing tricks that should
> > improve update performance on the NODES table and the always attached
> > triggers more than a bit.*)
> >
> >
> >
> > If we can get consensus around this I would like to add the Sqlite 3.8.2
> > amalgamation directly to our source tree about a week after its release
> > (Probably December 2013). We can than start optimizing our Sqlite schema
> and
> > query usage towards this version.
> >
> >
> > Bert
> >
> > *)
> > * The PRISTINE table is the perfect candidate for a 'WITHOUT rowid' table,
> > which should give a 30% performance boost on update/select. (Requires
> Sqlite
> > 3.8.2)
> > * The moved_to index on nodes is a typical example of where a partial
> index
> > would help reducing unnecessary index updates (Requires 3.8.0).
>
> As a matter of fact, all indexes that begin with wc_id are candidates
> for partial indexing (with WHERE wc_id=1).

Except that this would break all use of all these indexes. We query wc_id=?1 and then the query analyzer can't match that to '1', so the query won't use an index at all.

Sqlite is 'light' in a way that it only optimizes queries once, and never on actual values.

The 'big' sqlite engines will use additional passes to optimize for cases like these, but Sqlite never does that.

Bert
>
> -- Brane
>
>
> --
> Branko ÄŒibej | Director of Subversion
> WANdisco // Non-Stop Data
> e. brane_at_wandisco.com
Received on 2013-11-17 14:33:58 CET

This message: [ Message body ]
Next message: Bert Huijben: "RE: Using Sqlite in libsvn_wc (partial indexes)"
Previous message: Bert Huijben: "RE: Using Sqlite in libsvn_wc (introducing sqlite_stat1)"
In reply to: Branko ÄŒibej: "Re: Using Sqlite in libsvn_wc towards Subversion 1.9++"
Next in thread: Branko ÄŒibej: "Re: Using Sqlite in libsvn_wc towards Subversion 1.9++"
Reply: Branko ÄŒibej: "Re: Using Sqlite in libsvn_wc towards Subversion 1.9++"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]