[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: Using Sqlite in libsvn_wc (partial indexes)

From: Bert Huijben <bert_at_qqmail.nl>
Date: Sun, 17 Nov 2013 15:10:16 +0100

> -----Original Message-----
> From: Bert Huijben [mailto:bert_at_qqmail.nl]
> Sent: zondag 17 november 2013 14:33
> To: 'Branko Čibej'; dev_at_subversion.apache.org
> Subject: RE: Using Sqlite in libsvn_wc towards Subversion 1.9++
>
>
>
> > -----Original Message-----
> > From: Branko Čibej [mailto:brane_at_wandisco.com]
> > Sent: zondag 17 november 2013 14:08
> > To: dev_at_subversion.apache.org
> > Subject: Re: Using Sqlite in libsvn_wc towards Subversion 1.9++
> >
> > On 17.11.2013 13:34, Bert Huijben wrote:
> > > Hi,
> > >

<snip>

> > As a matter of fact, all indexes that begin with wc_id are candidates
> > for partial indexing (with WHERE wc_id=1).
>
> Except that this would break all use of all these indexes. We query wc_id=?1
> and then the query analyzer can't match that to '1', so the query won't use an
> index at all.
>
> Sqlite is 'light' in a way that it only optimizes queries once, and never on
> actual values.
>
> The 'big' sqlite engines will use additional passes to optimize for cases like
> these, but Sqlite never does that.

Breaking up to yet another thread.

(The fine prints are on http://www.sqlite.org/partialindex.html)

"A partial index is an index over a subset of the rows of a table.

In ordinary indexes, there is exactly one entry in the index for every row in the table. In partial indexes, only some subset of the rows in the table have corresponding index entries. For example, a partial index might omit entries for which the column being indexed is NULL. When used judiciously, partial indexes can result in smaller database files and improvements in both query and write performance."

Essentially they work 100% like a normal index, but they allow skip storing information for a lot of rows that will never be needed. So some rows are omitted.

When we use our move index we will query primarily the 'moved_to' column that is NULL for every NODES record that doesn't point to the root of a move, so that would be at least 99% of our records. So the index size would shrink by at least 99% and would not have to be updated in the cases where it is not needed.
(The not having to update part would have the biggest performance impact)

Your example was 'WHERE wc_id=1' would exactly match 100% of the cases, so it would not shrink the index, as it would still need to contain all rows. And to make things worse the query planner doesn't know about the exact value in the index, so when optimizing the query it would most likely skip the index in some cases where it doesn't know if the index contains the right value.

        Bert
Received on 2013-11-17 15:10:59 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.