[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Looking to improve performance of svn annotate

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Mon, 22 Mar 2010 22:25:24 +0100

On Mon, Mar 22, 2010 at 12:29 PM, Philipp Marek
<philipp.marek_at_emerion.com> wrote:
> Hello Julian,
> Hello Johan,

Hi Philipp,

Thanks for joining the discussion. All input is really welcome. Julian
has already responded to your suggestions, but it got me thinking ...
see below.

> On Montag, 22. März 2010, Julian Foad wrote:
> ...
>> My basic idea is an algorithm on the client that takes, as its input, a
>> sequence of binary diffs, one for each revision in which the file
>> changed, representing that incremental change.
> ...
>> The algorithm applies one of these diffs at a time to an in-memory data
>> structure that represents which revision is associated with each byte in
>> the file.
> ...
>> Then, at the end, we read in the blamed file's text (just that one
>> revision of it), match up the byte ranges to the lines in the file, and
>> for each line we display the highest recorded revision number that falls
>> within that line.
> please forgive me for intruding with barely some knowledge to share, but that
> sounds awful because of the round-trips (and latencies) involved.
>
> I'd suggest trying to keep the (current) version of the file line-wise in
> memory (stored as line-number, revision, start byte, length, and actual
> bytes); an incoming binary delta would just change the data and revision,
> until the deltas are exhausted.
>
> Then the blame output can be given directly, without any more round-trips.
>
> Of course, that means splitting xdelta operations into line-wise bits; but, as
> an improvement, you could also print the source line number of a change (by
> just "counting" the lines from top to bottom when applying a delta).

As Julian already mentioned in his further response, currently the
blame is calculated on the client side. And it's true that in my case,
network is not the issue (everything is on a LAN), but client-side
processing is.

However, looking at Julian's binary blame algorithm, I can't help but
wonder why this binary structure couldn't be calculated on the server
just as well. This would save a lot of network roundtrips (5999 in my
case :-)). Like I said, network is not an issue in our setup, but I
appreciate that there are other environments out there.

Or is there a reason this can't be calculated by the server (does it
have not enough information perhaps)? I have always found it quite
strange that the blame is calculated on the client side ...

In any case, I'll probably go for the binary blaming on the client
first (more consistent with how it currently works). Making it "server
calculated" (if that's at all possible) will probably be a much bigger
change with more impact on different parts of the code (I'm just
guessing here).

Johan
Received on 2010-03-22 22:25:52 CET

This is an archived mail posted to the Subversion Dev mailing list.