[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RE: svn commit: r1341684 - in /subversion/trunk: build/transform_sql.py subversion/libsvn_wc/wc-queries.sql subversion/tests/libsvn_wc/wc-queries-test.c

From: Bert Huijben <bert_at_qqmail.nl>
Date: Wed, 23 May 2012 09:53:33 +0200

> -----Original Message-----
> From: Greg Stein [mailto:gstein_at_gmail.com]
> Sent: woensdag 23 mei 2012 4:56
> To: dev_at_subversion.apache.org
> Subject: Re: svn commit: r1341684 - in /subversion/trunk:
build/transform_sql.py
> subversion/libsvn_wc/wc-queries.sql subversion/tests/libsvn_wc/wc-queries-
> test.c
>
> On Tue, May 22, 2012 at 8:03 PM, <rhuijben_at_apache.org> wrote:
> >...
> > +++ subversion/trunk/build/transform_sql.py Wed May 23 00:03:06 2012
> > @@ -110,10 +110,35 @@ class Processor(object):
> >     for line in input.split('\n'):
> >       line = line.replace('"', '\\"')
> >
> > +      # IS_STRICT_DESCENDANT_OF()
> > +
> > +      # A common operation in the working copy is determining
descendants of
> > +      # a node. To allow Sqlite to use its indexes to provide the
answer we
> > +      # must provide simple less than and greater than operations.
> > +      #
> > +      # For relative paths that consist of one or more components like
'subdir'
> > +      # we can accomplish this by comparing local_relpath with
'subdir/' and
> > +      # 'subdir0' ('/'+1 = '0')
> > +      #
> > +      # For the working copy root this case is less simple and not
strictly
> > +      # valid utf-8/16 (but luckily Sqlite doesn't validate utf-8 nor
utf-16).
> > +      # The binary blob x'FFFF' is higher than any valid utf-8 and
utf-16
> > +      # sequence.
> > +      #
> > +      # So for the root we can compare with > '' and < x'FFFF'. (This
skips the
> > +      # root itself and selects all descendants)
> > +      #
> > +      ### RH: I implemented this first with a user defined Sqlite
function. But
> > +      ### when I wrote the documentation for it, I found out I could
just
> > +      ### define it this way, without losing the option of just
dropping the
> > +      ### query in a plain sqlite3.
> > +
> >       # '/'+1 == '0'
> > -      line = re.sub(r'IS_STRICT_DESCENDANT_OF[(]([A-Za-z_.]+),
([?][0-9]+)[)]',
> > -                    r"((\2) != '' AND ((\1) > (\2) || '/') AND ((\1) <
(\2) || '0')) ",
> > -                    line)
> > +      line = re.sub(
> > +            r'IS_STRICT_DESCENDANT_OF[(]([A-Za-z_.]+), ([?][0-9]+)[)]',
> > +            r"(((\1) > (CASE (\2) WHEN '' THEN '' ELSE (\2) || '/'
END))" +
> > +            r" AND ((\1) < CASE (\2) WHEN '' THEN X'FFFF' ELSE (\2) ||
'0' END))",
> > +            line)
>
> Rather than using the CASE logic, couldn't you just do something like:
>
> r"(((\2 = '') AND (\1 != '')) OR (... old condition ...))"
>
> Seems simpler to me. Does it somehow invalidate the usage of the index?

Yes, this breaks the index usage :(

Sqlite determines which parts of the query are used for the indexing part
during statement preparation. So it doesn't know the actual values of ?1 and
?2 yet, and can only assume that they have the worst possible outcome in
every case.

The index is then used to create a result set, and the unevaluated portions
are then evaluated per row of the result.

In most cases Sqlite determines that cases like (local_relpath = ?2 OR
IS_STRICT_DESCENDANT_OF(local_relpath, ?2)) can be evaluated as two queries
after each other, both using the index. (As it statically knows these result
sets don't overlap)

In a few other cases, like STMT_RECURSIVE_UPDATE_NODE_REPO, the optimizer
isn't smart enough yet. It appears that it is smarter during simple
SELECT-s, then when updating or inserting.

        Bert
Received on 2012-05-23 09:54:19 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.