Re: svn2cvsgraph, how to best handle merges?

From: Henrik Carlqvist <hc528_at_poolhem.se>
Date: Wed, 9 Apr 2014 20:42:24 +0200

On Wed, 26 Mar 2014 19:41:38 +0000
Philip Martin <philip.martin_at_wandisco.com> wrote:

> Henrik Carlqvist <hc94_at_poolhem.se> writes:
> > Would people hosting public svn repositories think that it would be
> > nice if some people using my tool would make one svn connection for
> > each revision in the repository?
>
> It's a user problem as well since making a request per revision doesn't
> scale well and will be very slow for large projects.

As the merge information you get from "svn log -g" is somewhat recursive
it seems as if time grows exponentially with the number of revisions (or
maybe rather with the number of merges).

However, my own test version of svn2cvsgraph which calls svn once for each
revision does a pclose on the svn call after reading the first log entry
and the second log entry (which might be a merge). With such a solution
time grows linear with the number of revisions, but svn older than 1.7
will give some "svn: Write error: Broken pipe" to stderr.

I did a benchmark comparing a box running Slackware 14.1 with svn 1.7.16
and another box running Slackware 13.1 with svn 1.6.16. On these machines
I tested 3 version of svn2cvsgraph:

svn2cvsgraph 1.2: makes a single call to "svn log -q -g" on the subversion
repository root.

svn2cvsgraph 2.0: makes one call to "svn log -q -g" for each branch
(and trunk)

svn2cvsgraph 2.1beta: makes one call to "svn log -q -g" for each revision,
the call is aborted with pclose to avoid wasting
time on redundant information.

The benchmarks were run on a test subversion repository which was read
from a 2.9 GB big subversion dump file of an actual project repository.
The repository contains 13570 revisions and 160 branches. 206 merges has
been logged into the repository since the repository was upgraded to
version 1.5 of subversion. The test repository was accessed as file:/// on
an NFS server. Times were measured with the /usr/bin/time command.

These are the results:

subversion svn2cvsgraph time result
    1.7.16 1.2 6:13.70elapsed 17%CPU No merges found
    1.7.16 2.1beta 7:20.73elapsed 55%CPU All merges found
    1.7.16 2.0 13:49.48elapsed 45%CPU 23 merges lost

    1.6.16 2.1beta 52:53.63elapsed 81%CPU All merges found
    1.6.16 1.2 134:55:22elapsed 41%CPU All merges found
    1.6.16 2.0 135:14:04elapsed 41%CPU All merges found

Subversion 1.7.16 seems a lot faster than 1.6.16. Even though the tests
were run on different machines and the Slackware 14.1 machines is slightly
faster than the Slackware 13.1 machine I think that most of the difference
is thanks to that 1.7.16 gives less recursive merge information to wade
through.

No merges are found when only doing "svn log -q -g" on the repository root
with version 1.7.16. This is expected behavior as the behavior of "svn log
-g" changed with version 1.6.17.

23 merges were lost with "svn log -q -g" on every branch with 1.7.16, this
is most likely because of issue 4477.

Doing "svn log -q -g" for each revision and abort the output with pclose
is the fastest way to get correct results for both version 1.6.16 and
1.7.16. However, this is assuming that the repository is accessed with
file://. Previously I have instead been using svn+ssh:// with svn 1.6.16
and with one call for each branch or only for the repository root that
takes about 24 hours (compared with about 135 hours above). However using
svn+ssh:// instead of file:// when doing one call for each revision would
be a lot slower.

regards Henrik
Received on 2014-04-09 20:43:10 CEST

This message: [ Message body ]
Next message: Ben Reser: "Re: SVN client SSL CRL configuration"
Previous message: mskala_at_ansuz.sooke.bc.ca: "SVN client SSL CRL configuration"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]