[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Enhancing svn blame (Was: Case study: Mono switches to Subversion)

From: Peter N. Lundblad <peter_at_famlundblad.se>
Date: 2004-11-21 21:20:06 CET

On Sun, 21 Nov 2004, [UTF-8] Branko ^Libej wrote:

> Peter N. Lundblad wrote:
>
> >I'm still not convinced, but then I'm just a simple programmer.
> >
> Heh. Aren't we all. :-)
>
:-) More or less theretical, though.

> No, we already know at least two ways to get the information we want.
> One way is already being used by "svn blame" -- it interprets context
> diffs. Another way would be to extract the info from svndiffs (which is
> harder). A third way is to use the algorithm in libsvn_diff, but to feed
> it different tokens (e.g., bytes instead of lines).
>
Heh. It would be interesting to whach the performance of svndiff with
bytes as tokens.

> What we need to figure out is how to encode this information in the
> repository so that a) it is compact, b) can be used to calculate the
> blame info bachwards in time instead of forwards, and c) can be
> interpreted by the client without knowledge about the generating
> algorithm. Storing a list of added and deleted byte ranges seems like a
> logical choice.
>
This would make the algorithm O(n). Storing, for each revision, range
added in revision X would make the algorithm O(1), so there is room for
discusssion.

> That is, if we want to cache this info on the server at all. Maybe just
> calculating it on the server would be enough, as it would reduce the
> network turnaround quite a bit.
>
I'm not so sure that it is the network traffic that's the problem.

I've been blaming branches/1.1.x/STATUS. It has about 800 revisions and
the blame caused about 200K to be transmitted. The same applies for a file
in libsvn_wc with about 400 revisions. This is a problem if you are on a
modem (yes, I know:-).

Maybe we should take a step back and do some more testing before rushing
to implement something new on the server. During my tests, I found that
nearly half of the time that the blame requires passes before the first
revision arrives. I think this is due to the history traversal that's
required since we go forwards. If we go the other direction, the server
and client might be able to work more in parallell. Maybe we can get
acceptable performance without doing very much on the server side, keeping
the client flexibility. This needs some more testing.

Ofcourse, there is the question what acceptable performance is.

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Nov 21 21:10:08 2004

This is an archived mail posted to the Subversion Dev mailing list.