[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: svn status <file> is slow under a large check-out

From: Bob Cardillo <bob.cardillo_at_gmail.com>
Date: Fri, 4 May 2012 17:22:59 -0400

>What does your working copy look like? Do you have svn:externals? Do
>you have the sqlite3 tool? What do these commands show:
>
> sqlite3 .svn/wc.db "select count (*) from actual_node"
> sqlite3 .svn/wc.db "select count (*) from nodes"

I think my working copy is fairly typical for a very large project, a
well-balanced tree. I can get you numbers of files and folders, max
depth, etc., if that's what you're looking for.

I do not have any svn:externals.

Here are my results from sqlite3 for those two queries:
 - actual_node count: 49
 - nodes: 514957

Interestingly, and not surprisingly given the number of rows, the count
from the nodes table took a long time to come back on the first attempt,
maybe a minute. Doing it again came back in about 2 seconds. But with
such a large table it certainly makes sense that an index could have a
significant impact on performance.

On Fri, May 4, 2012 at 2:55 PM, Philip Martin <philip.martin_at_wandisco.com>wrote:

> Bob Cardillo <bob.cardillo_at_gmail.com> writes:
>
> > I'm running Subversion 1.7.4.50525 (r1295709) on Windows 7 Pro SP1.
> >
> > I have a large repository, and for clean development flow I've checked
> > out the root locally. But because of this, when I do:
> > svn status C:\mycheckout\trunk\folder1\file1.ext
> >
> > it takes a very long time, around 5-6 seconds, to finish. It doesn't
> > matter if the given file is modified or not.
> >
> > I've run Sysinternals Process Monitor and found that there are hundreds
> > of thousands of ReadFile operations done on \_svn\wc.db. There are no
> > network accesses of course, and from an analysis of the procmon results
> > it's clear the slowness is from these many wc.db accesses.
> >
> > But why? I could not find any issue related to this in the issue
> > tracker nor any mention of it in the forums, mailing lists, or elsewhere
> > on the web.
> >
> > One more point of interest. If I throw --non-recursive in there, as in:
> > svn status --non-recursive C:\mycheckout\trunk\folder1\file1.ext
> >
> > it comes back immediately, no delay whatsoever. Since this is a file, I
> > don't get why --non-recursive should make a difference, but there it is.
> >
> > Has anyone seen this? Any reason I should not add this to the issue
> > tracker for Subversion?
>
> Hmm, I'm not seeing a huge time difference but --non-recursive does have
> a similar effect on Linux:
>
> $ strace -cetrace=read svn st --non-recursive
> subversion/tests/libsvn_wc/wc_db.c
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> -nan 0.000000 0 99 read
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.000000 99 total
>
> $ strace -cetrace=read svn st ../src/subversion/tests/libsvn_wc/wc_db.c
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.000025 0 1193 read
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.000025 1193 total
>
> So that has increased the number of read() calls from 99 to 1193!
>
> Using -DSQLITE_DEBUG the difference is this:
>
> DBG: sqlite.c: 66: sql="SELECT IFNULL((SELECT properties FROM actual_node
> a WHERE a.wc_id = 1 AND A.local_relpath = n.local_relpath),
> properties), local_relpath, depth FROM nodes n WHERE
> wc_id = 1 AND ('A/f' = '' OR local_relpath = 'A/f' OR
> ((local_relpath) > ('A/f') || '/' AND (local_relpath) < ('A/f') || '0') )
> AND kind = 'dir' AND presence='normal' AND op_depth=(SELECT MAX(op_depth)
> FROM nodes o WHERE o.wc_id = 1 AND o.local_relpath =
> n.local_relpath) "
>
> which is this statement in svn_wc__db_externals_gather_definitions:
>
> -- STMT_SELECT_EXTERNAL_PROPERTIES
> SELECT IFNULL((SELECT properties FROM actual_node a
> WHERE a.wc_id = ?1 AND A.local_relpath = n.local_relpath),
> properties),
> local_relpath, depth
> FROM nodes n
> WHERE wc_id = ?1
> AND (?2 = ''
> OR local_relpath = ?2
> OR IS_STRICT_DESCENDANT_OF(local_relpath, ?2))
> AND kind = 'dir' AND presence='normal'
> AND op_depth=(SELECT MAX(op_depth) FROM nodes o
> WHERE o.wc_id = ?1 AND o.local_relpath = n.local_relpath)
>
> Taking that query out reduces the read() calls back to 99. Do we need
> another SQLite index? Can we improve that query?
>
> What does your working copy look like? Do you have svn:externals? Do
> you have the sqlite3 tool? What do these commands show:
>
> sqlite3 .svn/wc.db "select count (*) from actual_node"
> sqlite3 .svn/wc.db "select count (*) from nodes"
>
> --
> uberSVN: Apache Subversion Made Easy
> http://www.uberSVN.com
>
Received on 2012-05-04 23:23:32 CEST

This is an archived mail posted to the Subversion Dev mailing list.