Daniel Berlin wrote:
>On Thu, 2005-02-10 at 03:57 +0100, Branko Čibej wrote:
>>Daniel Berlin wrote:
>>>1. For a simple byte level blame, we should be able to get the answers
>>>from the diff format alone, without actually expanding it. You can
>>>simply mark the affected byte ranges, and stop when the whole file has
>>>been affected. What the output looks like is up for grabs.
>>>2. You could actually do a similar thing with line level blame.
>>>Discover the line byte ranges of the current revision, throw it into an
>>>interval tree, and then do queries to see which line a given diff
>>>affects, based on it's byte ranges. Stop when every line has been
>>>2 seems a bit harder than 1.
>>>I've been out of it for a while, so maybe there is a something about
>>>implementing #1 i have missed.
>>>Have i missed anything about #1 that makes it hard?
>>If by "diff format" you mean the binary delta, then... There was a time
>>when I thought this would be possible. Now I'm not so sure. The trouble
>>is that the vdelta doesn't generate an edit stream, it generates a
>>compressed block-copy stream. Which means that can never be 100% sure,
>>just from looking at the delta, which bytes in the target are really new
>>and which are just (offset) copies from the source. The only blocks you
>>can really be sure about are those that are represented by new data in
>>the delta (either NEW blocks or target copies that take data from NEW
>>blocks). This problem is made worse by our use of skip deltas in (at
>>least) the BDB back-end.
>>I agree that it would be nice if the server could generate some sort of
>>byte-range oriented info, but I don't think it can be done just by
>>looking at the deltas. It's sad, I know...
>After looking at the real slow,i'm not actually worried about this so
>For now it's actually not the main source of slowdown for blame.
>See my followup mail on the delta combiner.
>The delta combiner falls down badly when you have a even a moderate
>number of target ops in the window. It goes n^2 and takes up a large
>amount of the time
You probably mean "target ops in the first window" or "source ops in the
second window", yes?
I think you're right... every single bit in the combiner is linear or
logarithmic, except the tiny little recursion in copy_source_ops which
can, indeed, go quadratic with the right mix of data. The problem seems
to be that the right mix of data is far too easily acheivable, at least
/me bangs head against wall
Problem is I have no idea how to get rid of that recursion in
copy_source_ops and still combine deltas correctly.
The more I look at your numbers, the more it seems that xdelta is a
better choice. The worse compression seems to be balanced by better
performance, which makes sense. Unless I've done something really stupid
in vdelta or the combiner, we should probably look switching to xdelta
To unsubscribe, e-mail: email@example.com
For additional commands, e-mail: firstname.lastname@example.org
Received on Thu Feb 10 11:24:34 2005