[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: SQLite and the LIKE operator

From: Bert Huijben <bert_at_qqmail.nl>
Date: Mon, 16 May 2011 16:24:19 +0200

> -----Original Message-----
> From: Hyrum K Wright [mailto:hyrum_at_hyrumwright.org]
> Sent: maandag 16 mei 2011 11:18
> To: Bert Huijben
> Cc: Branko Čibej; dev_at_subversion.apache.org
> Subject: Re: SQLite and the LIKE operator
>
> On Mon, May 16, 2011 at 8:28 AM, Bert Huijben <bert_at_qqmail.nl> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Hyrum K Wright [mailto:hyrum_at_hyrumwright.org]
> >> Sent: maandag 16 mei 2011 9:39
> >> To: Branko Čibej
> >> Cc: dev_at_subversion.apache.org
> >> Subject: Re: SQLite and the LIKE operator
> >>
> >> 2011/5/16 Branko Čibej <brane_at_e-reka.si>:
> >> > On 16.05.2011 03:13, Hyrum K Wright wrote:
> >> >> Several places in wc_db we use the following pattern to select all
> >> >> nodes with a common tree ancestor:
> >> >> WHERE wc_id = ?1 AND (local_relpath = ?2 OR local_relpath LIKE ?3
> >> ESCAPE '#')
> >> >>
> >> >> While this works, there was some concern about whether or not
> SQLite
> >> >> was using the proper indicies when executing this query. By examining
> >> >> the output for 'EXPLAIN QUERY PLAN' on some of the relevant SELECT
> >> >> statements, I believe it does use the indicies as intended.
> >> >>
> >> >> However, I stumbled across an alternate implementation which I
> believe
> >> >> has some merit. Instead of the above clause, we could use:
> >> >> WHERE wc_id = ?1 AND substr(local_relpath, 1, length(?2)) = ?2
> >
> > This also needs a table scan, as SQLite can't look through the substr to find
> that it can use the index for the result.
>
> The SQLite query analyzer states that this executes a SEARCH, not a
> SCAN, which indicates the use of the index.
>
> > My guess is that
> > WHERE wc_id = ?1 AND ((local_relpath = ?2) OR (local_relpath > ?2 || '/'
> AND local_relpath < ?2 || '0'))
> > is most likely the most efficient form we can get in SQLite as the constant
> string directly map to an index, but we should really create a few tests to
> verify these guesses.
>
> I haven't done any tests, either, but I'm interested in an expression
> which doesn't require us to construct an additional parameter to the
> SQL query in C.
>
> > I'm going to the Elego office now. See you there? :)

Okay, the numbers are in... I wrote a test on a DB with more than 1 million nodes.
1024 directories, with 1024 files. 1024 files added to the root dir and before I started there was the normal greek tree, so that is in the same db

-- STMT_SELECT_LIKE_ESCAPED
 SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (local_relpath LIKE ?3 ESCAPE '#'))

-- STMT_SELECT_LIKE_NO_ESCAPE
 SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (local_relpath LIKE ?3))

-- STMT_SELECT_SUBSTR
SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (SUBSTR(local_relpath, 1, LENGTH(?2)+1) = ?2 || '/'))

-- STMT_SELECT_BETWEEN
SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (local_relpath > ?2 || '/' AND local_relpath < ?2 || '0'))

-- STMT_SELECT_BETWEEN2
SELECT local_relpath FROM NODES
 WHERE wc_id=?1 AND op_depth=0
   AND ((local_relpath=?2)
        OR (local_relpath > ?3 AND local_relpath < ?4))

All the queries returned the same 1025 answers when I looked for all nodes at and below "300".

I ran the tests 100 times to remove some jitter and the results are (in microseconds, on my Windows 7 notebook, SQLite 3.7.5):
STMT_SELECT_LIKE_ESCAPED: 743252
STMT_SELECT_LIKE_NO_ESCAPE: 683169
STMT_SELECT_SUBSTR: 885900
STMT_SELECT_BETWEEN: 555341
STMT_SELECT_BETWEEN2: 557631

So the select with > and < is the fastest query style. (Not surprising as SQLite in some cases optimizes like to this).

Generating the compare keys in SQLite is slightly faster than using static c strings. So creating them in cwould be even slower.

I added the rough patch I used to create these results to this mail. (I don't intend to commit it in this style)

        Bert
>
> Yes. :)
>
> -Hyrum

Received on 2011-05-16 16:25:38 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.