On Wed, Nov 17, 2004 at 07:15:04PM +0100, Peter N. Lundblad wrote:
> On Wed, 17 Nov 2004, C. Michael Pilato wrote:
> > Ben Collins-Sussman <email@example.com> writes:
> > > In svn 1.0, 'blame' was about 100x slower than CVS. In svn 1.1, it's
> > > about 10x slower. I really can't think of any way to make it faster,
> > > other than doing what CVS does: keeping a cache of contextual diffs on
CVS doesn't cache this; the RCS file is already that.
> > > the server, so the server can instantly generate annotation. We
> > > should probably start a different thread on this, to discuss (A) if we
> > > want to open an issue for this, (B) if/how we want to prioritize this
> > > enhancement at all.
> > CVS does keep contextual diffs, but we wouldn't need to store all of
> > that. All we'd need to store is what we actually want -- annotation
> > information. The username, date, and revision are stored as
> > revprops. All we lack are the line numbers, really.
> You mean at each commit, a list of changed lines is stored?
> I and sussman is discussing another way on IRC right now. That's doing the
> blame from the newest file and backwards. Then you could ask for ranges of
> lines and stop early, get feedback earlier (think GUI editor) and stop
> early if the whole file was changed sometimes (not sure if this owuld be
> common, though).
I suppose if you're only interested in a range of lines, then it could
be a win, but I think that most users will blame whole files.
For the whole file, I suspect the early termination case is rare (think
about copyright headers that exist at add time, for example). Because
of this, and because reverse-blame calculation seemed harder, I decided
to go forward in time.
A line-numbers-changed string (something like the rcs diff, but without
the actual text) generated at commit time would increase the commit
work-load, but would significantly reduce the blame CPU requirements
(we wouldn't have to reconstruct all of the fulltexts and then
At that point, it might even be worth optimizing the blame algorithm
itself (it is currently worst-case O(n^2) in the number of
blame-blocks). If we were willing to make it require space
proportional to the number of lines in the in the blamed revision,
it could be trivially converted to O(n*m), where n is the number
of changes and m is their average size. Also, that datastructure
would be more amenable to the backwards-in-time approach.
Another option (scarier) would be to keep skip-deltas of complete
blame computations. This would make the blame computation extremely
fast, at the cost of even more commit-time work and additional
To unsubscribe, e-mail: firstname.lastname@example.org
For additional commands, e-mail: email@example.com
Received on Thu Nov 18 02:18:35 2004