[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: How to do annotate

From: Karl Fogel <kfogel_at_newton.ch.collab.net>
Date: 2002-08-13 18:41:17 CEST

Daniel Berlin <dberlin@dberlin.org> writes:
> But there is little that can be done that isn't going to be very fragile.
> The reality is that we need not be perfect in our annotate output.

Whoa. Wait a second, here :-).

There is no reason our annotate output can't be perfect like CVS's.
All the information is there. We shouldn't settle for anything less
than perfect correctness.

We've been spending a lot of effort here trying to to get "svn blame"
data cheaply from our svndiff delta format. If we can do it that way,
then great. But if we can't, the answer isn't to settle for
inaccuracy -- it's to implement blame in some other way.

If we must resort to manual diffing and counting lines, then so be it.
Below is a loose description of such a system. I'm not saying it has
to be this way (haven't been closely following the svndiff-centric
discussion, just enough to see that it's non-trivial).

Doing Blame the Brute Force Way:
================================

There is a new table, `blame', mapping NodeChangeIDs to lists of the
form "((RANGE1 REV1) (RANGE2 REV2) ...)".

Each RANGE indicates a range of lines "(offset len)" in the node's
content, and the REV indicates the revision in which that range was
introduced. (Variant: or we can store ranges of bytes instead of
ranges of lines).

After a new revision is committed, it is added to a list of revs whose
annotations need to be updated. An asynchronous process, or an
internal post-commit thread, runs over that list. For each revision
in the list, it finds all the changed file paths; for each path, it
calculates and stores the blame information. This may involve
re-diffing the fulltexts (variant: use a non-compressing svndiff
instead of line-based diff, if we're storing byte ranges instead of
line ranges).

Is this horrendously inefficient? Well, it's certainly inefficient,
but not horrendously so because it doesn't delay anything. All the
work happens after the commit has already succeeded. It'd be nice to
find a more efficient way, just not at the price of correctness.

Backwards compatibility and crash handling are both covered by the
same rule: if you go to fetch an annotation and it isn't there, then
calculate it on the spot, recursively. (There will have to be some
mechanism to make sure you don't get two processes calculating the
same annotations at the same time, but I'm going to hand-wave on that
as we all know it's solveable).

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Aug 13 18:58:11 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.