On Wed, 17 Nov 2004, Mark Benedetto King wrote:
> > You mean at each commit, a list of changed lines is stored?
> > I and sussman is discussing another way on IRC right now. That's doing the
> > blame from the newest file and backwards. Then you could ask for ranges of
> > lines and stop early, get feedback earlier (think GUI editor) and stop
> > early if the whole file was changed sometimes (not sure if this owuld be
> > common, though).
> I suppose if you're only interested in a range of lines, then it could
> be a win, but I think that most users will blame whole files.
If you know that limiting the line range may save a lot of time, you may
take the time to just ask for the few specific lines you're interested in.
> For the whole file, I suspect the early termination case is rare (think
> about copyright headers that exist at add time, for example). Because
> of this, and because reverse-blame calculation seemed harder, I decided
> to go forward in time.
I mentioned that copyright statement problem on IRC, but not in the mail.
But this is specific to the command line. If you allow the callback to be
called for lines in the order the blame information is ready, you'd get
feedback much faster. I'm imagining an IDE or editor with this support
built-in. There you would just select the lines you're interested in and
the view would get updated as information is available. I'd say this could
even be implemented in Emacs.
> A line-numbers-changed string (something like the rcs diff, but without
> the actual text) generated at commit time would increase the commit
> work-load, but would significantly reduce the blame CPU requirements
> (we wouldn't have to reconstruct all of the fulltexts and then
> re-diff them).
> At that point, it might even be worth optimizing the blame algorithm
> itself (it is currently worst-case O(n^2) in the number of
> blame-blocks). If we were willing to make it require space
> proportional to the number of lines in the in the blamed revision,
> it could be trivially converted to O(n*m), where n is the number
> of changes and m is their average size. Also, that datastructure
> would be more amenable to the backwards-in-time approach.
> Another option (scarier) would be to keep skip-deltas of complete
> blame computations. This would make the blame computation extremely
> fast, at the cost of even more commit-time work and additional
I know you and others have been discussing blame caching on the server. I
would say this is the usual tradeoff between API flexibility and speed:-)
The problem with server support is that we hardcode the diffing algorithm,
or at least removes the possibility to have other diff programs provide
the blame information. We're not doing this today, but I don't like to
move this to the server if we can avoid it. Also, we should consider other
queries that are similar to blame, such as functionality to ask "who
deleted this line"? (I've actually seen this requironment somewhere.) The
problem with moving specialized functions to the server is that they have
to be supported by all three (though one is trivial) access methods.
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Fri Nov 19 19:57:14 2004