[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Enhancing svn blame (Was: Case study: Mono switches to Subversion)

From: Branko Čibej <brane_at_xbc.nu>
Date: 2004-11-21 19:19:19 CET

Peter N. Lundblad wrote:

>I'm still not convinced, but then I'm just a simple programmer.
>
Heh. Aren't we all. :-)

>I created
>a little program that outputs the instructions of a svndiff and played
>with it. I don't see how to generate the ranges from that output.
>
>
Every byte in the target stream comes either directly from the source,
or is new data. If it's new data, then it was added by the author of the
revision. If it's from source, then it gets more complicated because you
have to decide which bits are simply unchanged, and which bits were
added by the author but just happen to be similar to bits from the
source. This requires some careful tuning of heuristics in chosing the
size of the context that defines a single "change". Certainly you can
guess wrong, but if you think about it, a context diff is also just
guesswork. For example, if you move the first 5 lines ins 500-line file
to the end, are you actually the author of those 5 lines now, or not?
Today, "svn blame" will say that you are.

>We can talk about exactly how to store this information later. First I
>need to be convinced that we can get the information we want. :-)
>
>
No, we already know at least two ways to get the information we want.
One way is already being used by "svn blame" -- it interprets context
diffs. Another way would be to extract the info from svndiffs (which is
harder). A third way is to use the algorithm in libsvn_diff, but to feed
it different tokens (e.g., bytes instead of lines).

What we need to figure out is how to encode this information in the
repository so that a) it is compact, b) can be used to calculate the
blame info bachwards in time instead of forwards, and c) can be
interpreted by the client without knowledge about the generating
algorithm. Storing a list of added and deleted byte ranges seems like a
logical choice.

That is, if we want to cache this info on the server at all. Maybe just
calculating it on the server would be enough, as it would reduce the
network turnaround quite a bit.

-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sun Nov 21 19:20:28 2004

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.