[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: SQL indices a WC format bump and 1.7

From: Greg Stein <gstein_at_gmail.com>
Date: Fri, 2 Sep 2011 17:11:52 -0400

On Fri, Sep 2, 2011 at 12:35, Philip Martin <philip.martin_at_wandisco.com> wrote:
> Philip Martin <philip.martin_at_wandisco.com> writes:
>
>> The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7
>>
>>    SELECT local_relpath, op_depth, presence, kind
>>    FROM nodes
>>    WHERE wc_id = ?1 AND parent_relpath = ?2
>>    GROUP BY local_relpath
>>    ORDER BY op_depth DESC
>>
>> performs poorly and doesn't scale well with working copy size.
>
> After some more discussion in IRC it seems that the above SQL is
> dubious: the selected columns may not all come from the same row, and we
> may not get the highest op_depth.  Using NODES_CURRENT would fix the
> problem as does my C solution, and the combination of the two is now on
> trunk.  So I'll be proposing r1164426 and r1164614 to fix the 1.7
> performance problem without any schema changes.

There are some differences in wc_db.c after your two changes:

$ svn diff -r HEAD wc_db.c -x -p
Index: wc_db.c
===================================================================
--- wc_db.c (revision 1164712)
+++ wc_db.c (working copy)
@@ -7005,8 +7005,7 @@ struct read_children_info_baton_t
   apr_pool_t *result_pool;
 };

-/* What we really want to store about a node. This relies on the
- offset of svn_wc__db_info_t being zero. */
+/* What we really want to store about a node */
 struct read_children_info_item_t
 {
   struct svn_wc__db_info_t info;
@@ -7443,6 +7442,7 @@ svn_wc__db_read_children_walker_info(apr_hash_t **
   const char *dir_relpath;
   svn_sqlite__stmt_t *stmt;
   svn_boolean_t have_row;
+ apr_int64_t op_depth;

   SVN_ERR_ASSERT(svn_dirent_is_absolute(dir_abspath));

@@ -7462,10 +7462,13 @@ svn_wc__db_read_children_walker_info(apr_hash_t **
       struct svn_wc__db_walker_info_t *child;
       const char *child_relpath = svn_sqlite__column_text(stmt, 0, NULL);
       const char *name = svn_relpath_basename(child_relpath, NULL);
- apr_int64_t op_depth = svn_sqlite__column_int(stmt, 1);
       svn_error_t *err;

- child = apr_palloc(result_pool, sizeof(*child));
+ child = apr_hash_get(*nodes, name, APR_HASH_KEY_STRING);
+ if (child == NULL)
+ child = apr_palloc(result_pool, sizeof(*child));
+
+ op_depth = svn_sqlite__column_int(stmt, 1);
       child->status = svn_sqlite__column_token(stmt, 2, presence_map);
       if (op_depth > 0)
         {

Since we are selecting only "current nodes" for each local_relpath,
then I presume we don't need the apr_hash_get() in there, correct?

(and I already know that we can eliminate the apparent diffs around
op_depth with no harm)

I'd really like to put the code back to pre-r1164426 if at all
possible (makes it clearer what we're voting for in STATUS).

Cheers,
-g
Received on 2011-09-02 23:12:23 CEST

This is an archived mail posted to the Subversion Dev mailing list.