On Fri, Sep 2, 2011 at 12:35, Philip Martin <philip.martin_at_wandisco.com> wrote:
> Philip Martin <philip.martin_at_wandisco.com> writes:
>
>> The query STMT_SELECT_NODE_CHILDREN_WALKER_INFO as used in 1.7
>>
>> SELECT local_relpath, op_depth, presence, kind
>> FROM nodes
>> WHERE wc_id = ?1 AND parent_relpath = ?2
>> GROUP BY local_relpath
>> ORDER BY op_depth DESC
>>
>> performs poorly and doesn't scale well with working copy size.
>
> After some more discussion in IRC it seems that the above SQL is
> dubious: the selected columns may not all come from the same row, and we
> may not get the highest op_depth. Using NODES_CURRENT would fix the
> problem as does my C solution, and the combination of the two is now on
> trunk. So I'll be proposing r1164426 and r1164614 to fix the 1.7
> performance problem without any schema changes.
There are some differences in wc_db.c after your two changes:
$ svn diff -r HEAD wc_db.c -x -p
Index: wc_db.c
===================================================================
--- wc_db.c (revision 1164712)
+++ wc_db.c (working copy)
@@ -7005,8 +7005,7 @@ struct read_children_info_baton_t
apr_pool_t *result_pool;
};
-/* What we really want to store about a node. This relies on the
- offset of svn_wc__db_info_t being zero. */
+/* What we really want to store about a node */
struct read_children_info_item_t
{
struct svn_wc__db_info_t info;
@@ -7443,6 +7442,7 @@ svn_wc__db_read_children_walker_info(apr_hash_t **
const char *dir_relpath;
svn_sqlite__stmt_t *stmt;
svn_boolean_t have_row;
+ apr_int64_t op_depth;
SVN_ERR_ASSERT(svn_dirent_is_absolute(dir_abspath));
@@ -7462,10 +7462,13 @@ svn_wc__db_read_children_walker_info(apr_hash_t **
struct svn_wc__db_walker_info_t *child;
const char *child_relpath = svn_sqlite__column_text(stmt, 0, NULL);
const char *name = svn_relpath_basename(child_relpath, NULL);
- apr_int64_t op_depth = svn_sqlite__column_int(stmt, 1);
svn_error_t *err;
- child = apr_palloc(result_pool, sizeof(*child));
+ child = apr_hash_get(*nodes, name, APR_HASH_KEY_STRING);
+ if (child == NULL)
+ child = apr_palloc(result_pool, sizeof(*child));
+
+ op_depth = svn_sqlite__column_int(stmt, 1);
child->status = svn_sqlite__column_token(stmt, 2, presence_map);
if (op_depth > 0)
{
Since we are selecting only "current nodes" for each local_relpath,
then I presume we don't need the apr_hash_get() in there, correct?
(and I already know that we can eliminate the apparent diffs around
op_depth with no harm)
I'd really like to put the code back to pre-r1164426 if at all
possible (makes it clearer what we're voting for in STATUS).
Cheers,
-g
Received on 2011-09-02 23:12:23 CEST