Re: How to do annotate

From: Greg Hudson <ghudson_at_MIT.EDU>
Date: 2002-08-13 16:38:10 CEST

On Tue, 2002-08-13 at 09:58, Daniel Berlin wrote:
> So should we just point out the problems that we know we can't solve, or
> should we do the best we can?

In this case, I think our lemonade is going to taste like rotten
tomatoes if we make it this way. Let me raise a third concern, related
to windows: if I insert just one line at the beginning of a file, then
it will look like there is a bit of new data every 128K through.

Bill has noted that if we went back to having source windows larger than
target windows, this wouldn't be an issue. But we can't really do that
and have a delta combiner. If byte 0 of rev 1 can depend on byte 150K
of rev 2, and byte 150K of rev 2 can depend on byte 300K of rev 3, and
so on, then byte 0 of rev 1 can depend on pretty much any byte of rev
1000. We lose streaminess of delta application.

Ignoring windows: a line-based diff does a pretty damn good job of
telling you what has changed, most of the time. This is because it's
only allowed to insert source data from the current point into the
source file. We, on the other hand, are allowed to insert data from
anywhere in the source window which happens to look good, or even from
earlier parts of the target window. So we do a miserable job of
reflecting what has changed.

So, what are our options? I listed these before, in passing:

1. Don't implement "svn blame", noting that it happens to be easy in
the CVS design and not in ours. See how much people complain.

  2. Implement "svn blame" slowly, by regenerating each rev of the file
     and doing a line-based diff between adjacent pairs. This won't
     actually be too bad until your repository starts to look like
     gcc's, with hundreds of revs per file.

  3. Add annotation data to the repository. This could be done in a
     number of ways: each could be annotated as it arrives
     (fast, heavy space penalty, but some people have lots of space);
     we could annotate every N revisions (a little slower, space penalty
     drops by a factor of N); there could be an "annotation cache" where
     we store annotations when they are requested (unpredictable speed,
     but space penalty is fixed at the size of the cache); we could
     store annotation diffs alongsize the binary deltas (skip deltas
     mean it won't take long to reconstruct any annotation).

  4. We could decide that for text files, we will store line-based diffs
     instead of svndiffs. It's not clear whether this would be a win or
     a lose for space in general, though it's certainly more
     complicated.

5. Or we could simply make rotten tomato lemonade.

I would suggest that we go with #2 for a while because it's relatively
easy and will suffice for the common case. The last variation of #3
might produce the best results overall, though it's distatesful that the
filesystem layer should be performing line-based diffs.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Tue Aug 13 16:41:01 2002

This message: [ Message body ]
Next message: cmpilato_at_collab.net: "Re: Problem with hooks"
Previous message: William Uther: "Re: How to do annotate (blame)"
In reply to: Daniel Berlin: "Re: How to do annotate"
Next in thread: Daniel Berlin: "Re: How to do annotate"
Reply: Daniel Berlin: "Re: How to do annotate"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]