On Mon, Jun 7, 2010 at 8:47 PM, <gstein_at_apache.org> wrote:
> Author: gstein
> Date: Tue Jun 8 00:47:22 2010
> New Revision: 952493
> URL: http://svn.apache.org/viewvc?rev=952493&view=rev
> The query that we used to fetch all children in BASE_NODE and WORKING_NODE
> used a UNION between two SELECT statements. The idea was to have SQLite
> remove all duplicates for us in a single query. Unfortunately, this caused
> SQLite to create an ephemeral (temporary) table and place the results of
> each query into that table. It created an index to remove dupliates. Then
> it returned the values in that ephemeral table. For large numbers of
> nodes, the construction of the table and its index becomes very costly.
> This change rebuilds gather_children() in wc_db.c to do the duplicate
> removal manually using a hash table. It does some simple scanning straight
> into an array when it knows duplicates cannot exist (one of BASE or
> WORKING is empty).
> The performance problem of svn_wc__db_read_children() was first observed
> in issue #3499. The actual performance improvement is untested so far, but
> I'm assuming pburba can pick up this change and try in his scenario.
On Mon, Jun 7, 2010 at 8:53 PM, Greg Stein <gstein_at_gmail.com> wrote:
> Hey Paul,
> Can you try this change on your large-file-count working copies? I
> believe this should fix the performance problems you were seeing.
Short Story: Hours to Seconds
Long Story: This does indeed solve the problems I was seeing:
My test repository was our test suite's Greek tree but with 17,000 1KB
files in a single directory:
Prior to r952493, update and status were taking *quite* some time:
svn st 01:23:33
svn up Gave up after an hour (i.e. lasted longer than my lunch).
With your fix in place, performance improves dramatically:
svn st 00:00:17
svn up 00:00:11
P.S. Thanks! I was nowhere near figuring this out :-\
Received on 2010-06-08 15:29:08 CEST