[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Looking to improve performance of svn annotate

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Mon, 22 Mar 2010 12:54:13 +0000

On Mon, 2010-03-22 at 12:29 +0100, Philipp Marek wrote:
> Hello Julian,
> Hello Johan,
>
> On Montag, 22. März 2010, Julian Foad wrote:
> ...
> > My basic idea is an algorithm on the client that takes, as its input, a
> > sequence of binary diffs, one for each revision in which the file
> > changed, representing that incremental change.
> ...
> > The algorithm applies one of these diffs at a time to an in-memory data
> > structure that represents which revision is associated with each byte in
> > the file.
> ...
> > Then, at the end, we read in the blamed file's text (just that one
> > revision of it), match up the byte ranges to the lines in the file, and
> > for each line we display the highest recorded revision number that falls
> > within that line.
> please forgive me for intruding with barely some knowledge to share, but that
> sounds awful because of the round-trips (and latencies) involved.

Hi Philipp. What do you mean exactly? I wonder if you misunderstood
when I said, "we read in the blamed file's text (just that one revision
of it)". I meant just the working revision of the file (or whichever
revision the blame command specified as the operative version). Not
every revision.

Or, if your concern is the requesting of the successive diffs for every
revision in which the file changed, well, that's the same number of
round trips as what the current code does. (It doesn't request every
revision number in the range, only the revision numbers where the file
content changed.) Reducing the number of round-trips would be an
additional improvement for other users, but Johan's problem is
client-side processing, not the network.

[...]
> > As regards reversing the current algorithm, first you might want to find
> > out what is the oldest revision number that shows up in the blame of
> > your typical file, and compare that with all the changes in the file's
> > history. Is it only reporting the most recent 1% of the file's changes,
> > in which case the reversed algorithm could achieve a 100-fold speed-up,
> > or 20%, or 80%?
> Well, how about using an iterative approach? Just do the algorithm you
> described for the revisions [N-100:N]; if there's any line that's still at
> N-100 fetch the blame information for [N-200:N-100], and substitute the
> 'unknown' lines; repeat as necessary.

It sounds like you are assuming the blame is being calculated on the
server, but at present the client calculates it and so the client can
stop at whichever revision fills in the last gap.

If we were to pipeline the client-server traffic, then we might request
diffs in batches like you suggest.

- Julian

> Sounds simpler than a full reverse run, and might be efficient enough - only a
> few revisions more than absolutely necessary are fetched and analyzed.
Received on 2010-03-22 13:54:50 CET

This is an archived mail posted to the Subversion Dev mailing list.