Paul Burba <paulb@softlanding.com> wrote on 10/27/2006 02:57:26 PM:
> Daniel Rall <dlr@collab.net> wrote on 10/25/2006 02:55:09 PM:
>
> > I've created a temporary branch, "ood-status-info", so we have a place
> > to work on refining these fixes.
> >
> > On Fri, 13 Oct 2006, Paul Burba wrote:
> <snip>
> > > Our remaining issues(?):
> > >
> > > A) The performance of svn_repos_deleted_rev()
> >
> > This is currently too slow to integrate into trunk. Let's write a
> > faster implementation -- hopefully we can speed things up by moving
> > the logic down inside the FS layer.
>
> I think I solved this performance problem, not in FS, but with a
different
> implmentation of the new function svn_repos_deleted_rev().
>
> A quick recap the existing performance problem (in r22118 of
> svn/branches/ood-status-info):
>
> * Start with a standard greek-tree test (fsfs) repos.
> * Delete /A/D/H/omega in r2.
> * Make a whole slew of (50,000+) of text mods to the other files.
> * Check out r1 via file://
> * Run "time svn st -u -v" in Cygwin
>
> r22118 of svn_repos_deleted_rev() implementation takes almost 5 minutes
on
> my machine:
>
> $ time svn st -v -u SVN/stat-tests-1/
> * 1 1 jrandom SVN\stat-tests-1\A\B\E\beta
> * 1 1 jrandom SVN\stat-tests-1\A\B\E\alpha
> 1 1 jrandom SVN\stat-tests-1\A\B\E
> * 1 1 jrandom SVN\stat-tests-1\A\B\lambda
> 1 1 jrandom SVN\stat-tests-1\A\B\F
> 1 1 jrandom SVN\stat-tests-1\A\B
> * 1 1 jrandom SVN\stat-tests-1\A\D\G\pi
> * 1 1 jrandom SVN\stat-tests-1\A\D\G\rho
> * 1 1 jrandom SVN\stat-tests-1\A\D\G\tau
> 1 1 jrandom SVN\stat-tests-1\A\D\G
> * 1 1 jrandom SVN\stat-tests-1\A\D\H\omega
> 1 1 jrandom SVN\stat-tests-1\A\D\H\psi
> * 1 1 jrandom SVN\stat-tests-1\A\D\H\chi
> 1 1 jrandom SVN\stat-tests-1\A\D\H
> * 1 1 jrandom SVN\stat-tests-1\A\D\gamma
> 1 1 jrandom SVN\stat-tests-1\A\D
> * 1 1 jrandom SVN\stat-tests-1\A\mu
> 1 1 jrandom SVN\stat-tests-1\A\C
> 1 1 jrandom SVN\stat-tests-1\A
> * 1 1 jrandom SVN\stat-tests-1\iota
> 1 1 jrandom SVN\stat-tests-1
> Status against revision: 53600
>
> real 4m58.004s
> user 0m0.015s
> sys 0m0.031s
>
> The slowness is due to svn_repos_deleted_rev's linear search that starts
> at the last rev in the search range and works backwards, looking for the
> last time a path existed. In the above scenario this takes a long time
> because a 50k rev files need to be opened. At the time I'd thought the
> linear was the only way to go because a path might be deleted, re-added
> and deleted many times, so a binary search wouldn't work. But as Mark
> pointed out to me, we don't really care if a path of the same name was
> later added then deleted, that is for the path's parent to care about in
> it's ood info. Even if a path is moved within the same directory this
> still holds.
>
> Looking at the problem in this new light, I used svn_fs_node_id() and
> svn_fs_compare_ids() to reimplement svn_repos_deleted_rev() as a
> non-recursive binary search and committed it to the ood branch in
r22139.
>
> The change from linear to logarithmic running time has the expected
> results, the 5 minute operation now takes less than a second:
>
> $ time svn st -u -v SVN/stat-tests-1
> * 1 1 jrandom SVN\stat-tests-1\A\B\E\beta
> * 1 1 jrandom SVN\stat-tests-1\A\B\E\alpha
> 1 1 jrandom SVN\stat-tests-1\A\B\E
> * 1 1 jrandom SVN\stat-tests-1\A\B\lambda
> 1 1 jrandom SVN\stat-tests-1\A\B\F
> 1 1 jrandom SVN\stat-tests-1\A\B
> * 1 1 jrandom SVN\stat-tests-1\A\D\G\pi
> * 1 1 jrandom SVN\stat-tests-1\A\D\G\rho
> * 1 1 jrandom SVN\stat-tests-1\A\D\G\tau
> 1 1 jrandom SVN\stat-tests-1\A\D\G
> * 1 1 jrandom SVN\stat-tests-1\A\D\H\omega
> 1 1 jrandom SVN\stat-tests-1\A\D\H\psi
> * 1 1 jrandom SVN\stat-tests-1\A\D\H\chi
> 1 1 jrandom SVN\stat-tests-1\A\D\H
> * 1 1 jrandom SVN\stat-tests-1\A\D\gamma
> 1 1 jrandom SVN\stat-tests-1\A\D
> * 1 1 jrandom SVN\stat-tests-1\A\mu
> 1 1 jrandom SVN\stat-tests-1\A\C
> 1 1 jrandom SVN\stat-tests-1\A
> * 1 1 jrandom SVN\stat-tests-1\iota
> 1 1 jrandom SVN\stat-tests-1
> Status against revision: 53600
>
> real 0m0.340s
> user 0m0.015s
> sys 0m0.015s
>
> Anyone see any problems with this approach?
>
> Paul B.
For anyone who missed it, we discussed this patch at length on IRC on
10/27:
Essentially the conclusion was that a binary search would work to find the
*first* time a path was deleted. Problem was, my binary search algorithm
in r22139 made incorrect assumptions about how svn_fs_compare_ids()
behaved when comparing copied/modified nodes. I addressed these problems
in r22234 of svn/branches/ood-status-info. I also made an effort to
clearly document how the binary search works in the comments for
svn_repos_deleted_rev()...it might be worth look at those comments first
if you have any questions on this.
Paul B.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Wed Nov 8 16:20:16 2006