[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

svn log -g performance for large commits

From: Becker, Thomas <Thomas.Becker_at_torex.com>
Date: Thu, 22 Apr 2010 16:12:43 +0100

This is from an email conversation with C. Michael Pilato about the
degradation of 'svn log -g' involving revisions with plenty of changed

BTW: as a workaround would it help to split a large commit into several
ones so that mergeinfo examination can be skipped for revisions that
contain the log target?


Email conversation follows:

Thomas Becker wrote:

> Is it true that svn log -g will examine all paths for merge-info
affected in a
> particular revision? From my layman point-of-view it would suffice to
> just the path (and parent paths) for which the log was requested.
Could you
> shed some light on this, or is there any workaround?

C. Michael Pilato wrote:

> 'svn log' is by its very nature recursive. When you request the logs
for a
> directory, you are necessarily requesting them for all the children of
> directory, too. 'svn log -g' is no different, extending its search
> mergeinfo changes into any of the changed paths that are at or under
the log
> target itself. It's a bummer, to be sure, especially when big commits
are in
> the history.
> One thing I've considered in the past is the implementation of
> support for 'svn log', which in the default case (and when run against
> directory target, of course) would show only revisions in which the
> directory's properties changed (svn:ignore, svn:mergeinfo, etc.). In
> 'svn log -g' case, it would have the effect of only doing the
> examination and recursion based on mergeinfo changes on the target
> only, regardless of what might have changed "under" the directory.
> What do you think? Would this be useful?

Thomas Becker wrote:

> OK, I understand that a change in a path is also accounted to the
parent of
> that path. For this scenario it sure would be useful to be able to
> the scope of the log command.
> In our case however, I think it's a bit different: the performance of
the log
> of a path is affected by the number of "siblings" in any revision
involved in
> the log, e.g.
> r123455
> M /x/y/0000.txt
> r123456
> M /x/y/0000.txt
> ...
> M /x/y/9999.txt
> r123457
> M /x/y/0000.txt
> When requesting the log excluding revision 123456 (the one with many
> paths) it performs fast (e.g. 'svn log -g -r 0:123455 <url>' or
> 'svn log -g -r 123457:HEAD <url>' where <url> is
> file:///C:/Repos/XY/x/y/0000.txt). Whenever revision 123456 is
> performance degrades for 'svn log -g'. This leads me to the conclusion
> 'svn log -g' examines all paths in r123456, even the "siblings" of
> which should have no impact on the collection of mergeinfos for
0000.txt. But
> maybe I'm totally wrong here?

C. Michael Pilato wrote:

> Ah! It is the case that the FS API for asking "Which paths were
changed in
> this revision?" are exhaustive -- you can't ask for just the changed
paths in
> and under some level. And generally speaking you have to iterate over
> paths, ruling out the ones that don't apply to find the ones that do.
> Now ... it occurs to me that there might be some optimization possible
> I mean, if you're running log against a single file, and you haven't
> -v, then perhaps we can do a more direct mergeinfo comparison.

Thomas Becker wrote:

> This optimisation would be a great improvement for us as I suspect it
> the last time that we applied a change to a lot of files in one
revision (e.g.
> change of file header comments or svn:keywords property).

Software Development, Torex
T: +49 (0)30 49901-0 E: thomas.becker_at_torex.com
Torex Retail Solutions GmbH, Salzufer 8, D-10587 Berlin
T: +49 (0)30 49901-0  F: +49 (0)30 49901-139  www.torex.de
Received on 2010-04-22 18:21:29 CEST

This is an archived mail posted to the Subversion Dev mailing list.