[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SQL indices a WC format bump and 1.7

From: Hyrum K Wright <hyrum.wright_at_wandisco.com>
Date: Fri, 2 Sep 2011 10:23:56 -0500

On Fri, Sep 2, 2011 at 9:30 AM, Philip Martin
<philip.martin_at_wandisco.com> wrote:
> The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7
>
>   SELECT local_relpath, op_depth, presence, kind
>   FROM nodes
>   WHERE wc_id = ?1 AND parent_relpath = ?2
>   GROUP BY local_relpath
>   ORDER BY op_depth DESC
>
> performs poorly and doesn't scale well with working copy size.  This
> causes recursive functions that use svn_wc__internal_walk_children to be
> slow, things like "svn info --depth infinity".  I fixed this on trunk by
> changing the query and some C code, see r1164426.
>
> On IRC Bert pointed out that we can fix the problem by introducing a new
> index:
>
>   NODES(wc_id, parent_relpath, local_relpath, op_depth)
>
> This improves things dramatically.  If we add this new index we can
> revert r1164426, the index provides a slightly larger performance gain
> than the query/C code change.
>
> Bert also suggests changing our other indices by adding wc_id and/or
> local_relpath thus allowing them to be UNIQUE.  Can anyone confirm that
> UNIQUE indices are better?
>
> I think that the I_ROOT and I_LOCAL_ABSPATH indices are unnecessary
> given that columns they index are defined as UNIQUE. Can anyone confirm
> that we don't need indices on UNIQUE columns?

I believe that UNIQUE columns are auto indexed. For example:
  sqlite> PRAGMA INDEX_LIST('wcroot');
  0|I_LOCAL_ABSPATH|1
  1|sqlite_autoindex_WCROOT_1|1

and

  sqlite> select * from sqlite_master where type = 'index' and
tbl_name = 'WCROOT';
  index|sqlite_autoindex_WCROOT_1|WCROOT|8|
  index|I_LOCAL_ABSPATH|WCROOT|9|CREATE UNIQUE INDEX I_LOCAL_ABSPATH
ON WCROOT (local_abspath)

would both indicate there are two indices on the WCROOT table, though
we only define one. I believe one of these indices is due to the
UNIQUEness of the local_abspath column.

Also, from http://www.sqlite.org/faq.html#q7 (in describing the
sqlite_master table):
  For automatically created indices (used to implement the PRIMARY KEY
or UNIQUE constraints) the sql field is NULL.

From this documentation, it would seem that our indices are redundant.

> It's possible that we don't need I_EXTERNALS_PARENT as none of the
> queries look like they will use it.  Perhaps we should drop it?
>
> So how should we fix the 1.7 performance problem?
>
> - Use r1164426, my non-schema change fix.
>
> - Create a new WC format 30 with the new index.
>
> - Create a new WC format 30 with all the schema changes in the patch
>  below.
>
> Changing the WC format would involve auto-upgrading format 29 working
> copies.  We need to decide whether we want the minimal format 30 change
> in 1.7 before we develop this feature on trunk.
>
>
> Patch to change indices below.  This is not a complete patch, it doesn't
> bump the format or auto-upgrade.  It does pass the regression tests and
> improve the performance of recursive info and propset.
>
> Index: subversion/libsvn_wc/wc-metadata.sql
> ===================================================================
> --- subversion/libsvn_wc/wc-metadata.sql        (revision 1164459)
> +++ subversion/libsvn_wc/wc-metadata.sql        (working copy)
> @@ -58,7 +58,6 @@
>  /* Note: a repository (identified by its UUID) may appear at multiple URLs.
>    For example, http://example.com/repos/ and https://example.com/repos/.  */
>  CREATE INDEX I_UUID ON REPOSITORY (uuid);
> -CREATE INDEX I_ROOT ON REPOSITORY (root);
>
>
>  /* ------------------------------------------------------------------------- */
> @@ -71,7 +70,6 @@
>   local_abspath  TEXT UNIQUE
>   );
>
> -CREATE UNIQUE INDEX I_LOCAL_ABSPATH ON WCROOT (local_abspath);
>
>
>  /* ------------------------------------------------------------------------- */
> @@ -178,8 +176,10 @@
>   PRIMARY KEY (wc_id, local_relpath)
>   );
>
> -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath);
> -CREATE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (changelist);
> +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath,
> +                                                    local_relpath);
> +CREATE UNIQUE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (wc_id, changelist,
> +                                                        local_relpath);
>
>
>  /* ------------------------------------------------------------------------- */
> @@ -478,7 +478,10 @@
>
>   );
>
> -CREATE INDEX I_NODES_PARENT ON NODES (wc_id, parent_relpath, op_depth);
> +CREATE UNIQUE INDEX I_NODES_PARENT1 ON NODES (wc_id, parent_relpath,
> +                                              op_depth, local_relpath);
> +CREATE UNIQUE INDEX I_NODES_PARENT2 ON NODES (wc_id, parent_relpath,
> +                                              local_relpath, op_depth);
>
>  /* Many queries have to filter the nodes table to pick only that version
>    of each node with the highest (most "current") op_depth.  This view
> @@ -567,7 +570,8 @@
>   PRIMARY KEY (wc_id, local_relpath)
>  );
>
> -CREATE INDEX I_EXTERNALS_PARENT ON EXTERNALS (wc_id, parent_relpath);
> +CREATE UNIQUE INDEX I_EXTERNALS_PARENT ON EXTERNALS (wc_id, parent_relpath,
> +                                                     local_relpath);
>  CREATE UNIQUE INDEX I_EXTERNALS_DEFINED ON EXTERNALS (wc_id,
>                                                       def_local_relpath,
>                                                       local_relpath);
> @@ -804,8 +808,10 @@
>   PRIMARY KEY (wc_id, local_relpath)
>   );
>
> -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath);
> -CREATE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (changelist);
> +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (wc_id, parent_relpath,
> +                                                    local_relpath);
> +CREATE UNIQUE INDEX I_ACTUAL_CHANGELIST ON ACTUAL_NODE (wc_id, changelist,
> +                                                        local_relpath);
>
>  INSERT INTO ACTUAL_NODE SELECT
>   wc_id, local_relpath, parent_relpath, properties, conflict_old,
>
> --
> uberSVN: Apache Subversion Made Easy
> http://www.uberSVN.com
>

-- 
uberSVN: Apache Subversion Made Easy
http://www.uberSVN.com/
Received on 2011-09-02 17:24:28 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.