[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Getting raw blame information

From: Hugh Gibson <hgibson_at_cix.co.uk>
Date: 2005-01-11 14:04:15 CET


I'm looking at ways of speeding up the "blame" output. For example, for
one of our files which is about 300k with around 1750 total revisions in
the repository, it takes around 2 minutes.

Looking at the source code (blame.c) it appears that it gets all the
revisions, collecting information about changed lines, then gets the final
text and matches that up with all the blame chunks.

I've noticed that performance drops as later revisions are retrieved, and
the linked list for blame must be the problem. blame_find() does a linear
search through the chunks, similar to blame_adjust() which has to iterate
through all the chunks to put in an offset. So there are two O(n^2)
processes here.

A quick win would be to make the search for *last* in blame_delete_range()
use *start* as the starting point, not db->blame.

I wondered if a tree structure would be a lot faster (though just as bad
in the degenerate case). Was this considered at all?

I normally work in Python these days and found blame.py in tools/examples.
It appears to use a different strategy - doing a full diff between each
version, which will be a lot slower. But if I can cache information
locally then maybe I could get away without having to obtain all the log
information. Is there anything like this around?

Hugh Gibson

To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Jan 11 14:05:39 2005

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.