[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: [PATCH] Use the `WITHOUT ROWID` SQLite optimization for rep-cache.db

From: Bert Huijben <bert_at_qqmail.nl>
Date: Fri, 8 Dec 2017 12:12:49 +0100

> -----Original Message-----
> From: Evgeny Kotkov [mailto:evgeny.kotkov_at_visualsvn.com]
> Sent: donderdag 30 november 2017 17:45
> To: dev_at_subversion.apache.org
> Subject: [PATCH] Use the `WITHOUT ROWID` SQLite optimization for rep-
> cache.db
>
> Hi all,
>
> The recent SQLite versions (starting from 3.8.2, released in December 2013)
> feature a `WITHOUT ROWID` optimization [1] that can be enabled when
> creating
> a table. In short, it works well for tables that have non-integer primary
> keys, such as
>
> name TEXT PRIMARY KEY
>
> by not maintaining the hidden rowid values and an another B-Tree to match
> between a primary key value and its rowid. This reduces the on-disk size
> and makes the lookups faster (a key → rowid → data lookup is replaced with
> a key → data lookup).

It doesn't add another B-tree for the primary key and its rowids. For the primary key the main table is used as the index.

The case where things differ is when there are multiple indexes. In this case normal table will always refer to the primary key using the rowed, while for 'WITHOUT ROWID' there will be referred to the primary key, which in general is larger.

It really depends on the use case where this helps... or makes things worse... Certain optimizations inside SQLite are not available for such tables.

For this specific table I think this helps as we only use the primary key index, but I'm guessing it won't help us much on other tables.

When we bump the required SQLite version for the client we might want to update the scheme of wc.db to use a sparse index for the move info table. This index contains mostly NULL values, which SQLite can easily stop maintaining, improving write("UPDATE") speed on the NODES tables considerably.

        Bert

>
> Currently, the rep-cache.db schema uses a non-integer primary key:
>
> hash TEXT NOT NULL PRIMARY KEY
>
> and can benefit from this optimization. A quick experiment showed a
> reduction of the on-disk size of the database by ~1.75x. The lookups
> should also be faster, both due to the reduced database size and due to
> the lesser amount of internal bsearches. This should improve the times
> of new commits and `svnadmin load`, especially for large repositories
> that also have large rep-cache.db files.
>
> I think that it would be nice to have this optimization in rep-cache.db,
> and that we can start using it in a compatible way:
>
> - All existing rep-cache.db statements are compatible with it.
>
> - Since SQLite versions prior to 3.8.2 don't support it, we would
> only create the new tables with this optimization in fsfs format 8,
> and simultaneously bump the minimal required SQLite version from
> 3.7.12 (May 2012) to 3.8.2 (December 2013). This would ensure that
> all binaries supporting format 8 can work with the tables with this
> optimization.
>
> Would there be any objections to a change like this (see the attached
> patch)?
>
> [1] https://sqlite.org/withoutrowid.html
>
>
> Thanks,
> Evgeny Kotkov
Received on 2017-12-08 12:12:56 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.