[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: How to do annotate

From: Daniel Berlin <dberlin_at_dberlin.org>
Date: 2002-08-13 16:48:22 CEST

On 13 Aug 2002, Greg Hudson wrote:

> On Tue, 2002-08-13 at 09:58, Daniel Berlin wrote:
> > So should we just point out the problems that we know we can't solve, or
> > should we do the best we can?
>
> In this case, I think our lemonade is going to taste like rotten
> tomatoes if we make it this way. Let me raise a third concern, related
> to windows: if I insert just one line at the beginning of a file, then
> it will look like there is a bit of new data every 128K through.
>
> Bill has noted that if we went back to having source windows larger than
> target windows, this wouldn't be an issue. But we can't really do that
> and have a delta combiner. If byte 0 of rev 1 can depend on byte 150K
> of rev 2, and byte 150K of rev 2 can depend on byte 300K of rev 3, and
> so on, then byte 0 of rev 1 can depend on pretty much any byte of rev
> 1000. We lose streaminess of delta application.
>
> Ignoring windows: a line-based diff does a pretty damn good job of
> telling you what has changed, most of the time.
Correct.
> This is because it's
> only allowed to insert source data from the current point into the
> source file. We, on the other hand, are allowed to insert data from
> anywhere in the source window which happens to look good, or even from
> earlier parts of the target window. So we do a miserable job of
> reflecting what has changed.
>
> So, what are our options? I listed these before, in passing:
>
> 1. Don't implement "svn blame", noting that it happens to be easy in
> the CVS design and not in ours. See how much people complain.

I'm not sure this is a good idea, only because svn blame isn't hard to
implement, it's just going to be slow if we do it line based.

>
> 2. Implement "svn blame" slowly, by regenerating each rev of the file
> and doing a line-based diff between adjacent pairs. This won't
> actually be too bad until your repository starts to look like
> gcc's, with hundreds of revs per file.

Even then, skip-deltas should allow us to generate the fulltexts fast, so
the slowdown is completely dependent on the speed of the diff algorithm.

>
> 3. Add annotation data to the repository. This could be done in a
> number of ways: each could be annotated as it arrives
> (fast, heavy space penalty, but some people have lots of space);
> we could annotate every N revisions (a little slower, space penalty
> drops by a factor of N); there could be an "annotation cache" where
> we store annotations when they are requested (unpredictable speed,
> but space penalty is fixed at the size of the cache); we could
> store annotation diffs alongsize the binary deltas (skip deltas
> mean it won't take long to reconstruct any annotation).

>
> 4. We could decide that for text files, we will store line-based diffs
> instead of svndiffs. It's not clear whether this would be a win or
> a lose for space in general, though it's certainly more
> complicated.
>
> 5. Or we could simply make rotten tomato lemonade.
>
> I would suggest that we go with #2 for a while because it's relatively
> easy and will suffice for the common case. The last variation of #3
> might produce the best results overall, though it's distatesful that the
> filesystem layer should be performing line-based diffs.

I'm of the theory that annotate/blame is not used often enough that it's
worth the penalties. Let's just do it slowly.
--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Aug 13 16:48:52 2002

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.