On Wed, Sep 17, 2014 at 05:02:33PM -0000, stsp_at_apache.org wrote:
> Author: stsp
> Date: Wed Sep 17 17:02:33 2014
> New Revision: 1625674
> URL: http://svn.apache.org/r1625674
> Fix a big scalability problem in the implementation of svnpredumpfilter.py.
> The script kept re-computing the set of additional include paths while
> mining the log history for copied paths. Each re-computation involved
> a full iteration of the set of copies accumulated so far, which made
> the run time explode on large repositories.
> Instead, we can gather all copies first, and then iterate them at once.
> In my testing this change reduces the runtime of svnpredumpfilter.py on
> a 64GB large dump file of the FreeBSD repository (up to r271458) from
> several days(!) to 1.5 minutes.
> * tools/server-side/svnpredumpfilter.py
> (svn_log_stream_get_dependencies): Run dt.handle_changes() once the log
> history has been fully scanned, not for each revision.
It is possible that there is a slight regression with this change.
Currently the script is only detecting direct copy sources of the
to-be-included set of paths, but not copy sources of copy sources.
I'm working on a fix for this problem that doesn't involve reverting
this change and still lets the script complete its task within a
reasonable amount of time.
> Modified: subversion/trunk/tools/server-side/svnpredumpfilter.py
> URL: http://svn.apache.org/viewvc/subversion/trunk/tools/server-side/svnpredumpfilter.py?rev=1625674&r1=1625673&r2=1625674&view=diff
> --- subversion/trunk/tools/server-side/svnpredumpfilter.py (original)
> +++ subversion/trunk/tools/server-side/svnpredumpfilter.py Wed Sep 17 17:02:33 2014
> @@ -204,7 +204,6 @@ def svn_log_stream_get_dependencies(stre
> - dt.handle_changes(path_copies)
> # Finally, skip any log message lines. (If there are none,
> # remember the last line we read, because it probably has
> @@ -221,6 +220,7 @@ def svn_log_stream_get_dependencies(stre
> "'svn log' with the --verbose (-v) option when "
> "generating the input to this script?")
> + dt.handle_changes(path_copies)
> return dt
> def analyze_logs(included_paths):
Received on 2014-09-17 21:59:12 CEST