Re: [PATCH] Use the `WITHOUT ROWID` SQLite optimization for rep-cache.db

From: Branko Čibej <brane_at_apache.org>
Date: Fri, 8 Dec 2017 12:46:42 +0100

On 08.12.2017 12:40, Evgeny Kotkov wrote:
> Bert Huijben <bert_at_qqmail.nl> writes:
>
>>> The recent SQLite versions (starting from 3.8.2, released in December 2013)
>>> feature a `WITHOUT ROWID` optimization [1] that can be enabled when
>>> creating a table. In short, it works well for tables that have non-integer
>>> primary keys, such as
>>>
>>> name TEXT PRIMARY KEY
>>>
>>> by not maintaining the hidden rowid values and an another B-Tree to match
>>> between a primary key value and its rowid. This reduces the on-disk size
>>> and makes the lookups faster (a key → rowid → data lookup is replaced with
>>> a key → data lookup).
>> It doesn't add another B-tree for the primary key and its rowids. For the
>> primary key the main table is used as the index.
>>
>> The case where things differ is when there are multiple indexes. In this
>> case normal table will always refer to the primary key using the rowed,
>> while for 'WITHOUT ROWID' there will be referred to the primary key,
>> which in general is larger.
> For the sake of finding out the truth :), I think that this contradicts the
> explanation in https://sqlite.org/withoutrowid.html :
>
> CREATE TABLE IF NOT EXISTS wordcount(
> word TEXT PRIMARY KEY,
> cnt INTEGER
> );
>
> As an ordinary SQLite table, "wordcount" is implemented as two separate
> B-Trees. The main table uses the hidden rowid value as the key and stores
> the "word" and "cnt" columns as data. The "TEXT PRIMARY KEY" phrase of
> the CREATE TABLE statement causes the creation of an unique index on
> the "word" column. This index is a separate B-Tree that uses "word" and
> the "rowid" as the key and stores no data at all. Note that the
> complete text of every "word" is stored twice: once in the main
> table and again in the index.
>
> Although I didn't check if the most recent SQLite version still behaves in
> the described way internally, I have witnessed the described close-to-2x
> reduction in the size of rep-cache.db — which, unless I am missing
> something, follows the described idea of this optimization.

I suspect most of the performance improvement is a consequence of the
reduced database size, since it needs less page-in operations for the
same number of lookups. The B-tree search should really be trivial
compared to uncached I/O.

-- Brane
Received on 2017-12-08 12:46:48 CET

This message: [ Message body ]
Next message: Troy Curtis Jr: "Re: svn_client_status6() and svn_client_patch_func_t doc strings"
Previous message: Evgeny Kotkov: "Re: [PATCH] Use the `WITHOUT ROWID` SQLite optimization for rep-cache.db"
In reply to: Evgeny Kotkov: "Re: [PATCH] Use the `WITHOUT ROWID` SQLite optimization for rep-cache.db"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]