[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Enhancing svn blame (Was: Case study: Mono switches to Subversion)

From: Mark Benedetto King <mbk_at_lowlatency.com>
Date: 2004-11-18 02:19:54 CET

On Wed, Nov 17, 2004 at 07:15:04PM +0100, Peter N. Lundblad wrote:
> On Wed, 17 Nov 2004, C. Michael Pilato wrote:
>
> > Ben Collins-Sussman <sussman@collab.net> writes:
> >
> > > In svn 1.0, 'blame' was about 100x slower than CVS. In svn 1.1, it's
> > > about 10x slower. I really can't think of any way to make it faster,
> > > other than doing what CVS does: keeping a cache of contextual diffs on

CVS doesn't cache this; the RCS file is already that.

> > > the server, so the server can instantly generate annotation. We
> > > should probably start a different thread on this, to discuss (A) if we
> > > want to open an issue for this, (B) if/how we want to prioritize this
> > > enhancement at all.
> >
> > CVS does keep contextual diffs, but we wouldn't need to store all of
> > that. All we'd need to store is what we actually want -- annotation
> > information. The username, date, and revision are stored as
> > revprops. All we lack are the line numbers, really.
> >
> You mean at each commit, a list of changed lines is stored?
>
> I and sussman is discussing another way on IRC right now. That's doing the
> blame from the newest file and backwards. Then you could ask for ranges of
> lines and stop early, get feedback earlier (think GUI editor) and stop
> early if the whole file was changed sometimes (not sure if this owuld be
> common, though).
>

I suppose if you're only interested in a range of lines, then it could
be a win, but I think that most users will blame whole files.

For the whole file, I suspect the early termination case is rare (think
about copyright headers that exist at add time, for example). Because
of this, and because reverse-blame calculation seemed harder, I decided
to go forward in time.

A line-numbers-changed string (something like the rcs diff, but without
the actual text) generated at commit time would increase the commit
work-load, but would significantly reduce the blame CPU requirements
(we wouldn't have to reconstruct all of the fulltexts and then
re-diff them).

At that point, it might even be worth optimizing the blame algorithm
itself (it is currently worst-case O(n^2) in the number of
blame-blocks). If we were willing to make it require space
proportional to the number of lines in the in the blamed revision,
it could be trivially converted to O(n*m), where n is the number
of changes and m is their average size. Also, that datastructure
would be more amenable to the backwards-in-time approach.

Another option (scarier) would be to keep skip-deltas of complete
blame computations. This would make the blame computation extremely
fast, at the cost of even more commit-time work and additional
storage.

--ben

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Thu Nov 18 02:18:35 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.