[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [RFC] Diff (blame) optimization: how to go forward

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Fri, 8 Oct 2010 22:24:02 +0200

On Fri, Oct 8, 2010 at 3:08 PM, Daniel Shahaf <d.s_at_daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Fri, 8 Oct 2010 at 01:44 -0000:
>> With suffix-lines-to-keep=50, you'd need to insert a
>> block of text that has its last 50 lines identical to the 50 lines
>> preceding the insertion point, to mess up the diff result.
>>
>> - If really necessary, we could say default=50, but it can be
>> overridden by the user with another -x option.
>>
>> So the main question is: is such a change in behavior (unlikely to
>> have an impact in most realistic cases) acceptable for this kind of
>> performance improvement?
>
> Just to clarify: are you asking whether it's acceptable to have a "not
> nice" (but still semantically correct) diff output in case the user adds
> a block whose last 50 lines match the 50 lines prior to the first
> difference?
>
> And 'not nice' just means "the preceding, rather than trailing, instance
> of the repeated block would be grabbed by the /^+/ lines in the diff",
> right?

Yep, that's what I meant.

> In this case, I think I'm going to answer your long mail with just...
>
>    Yes.

Ok, great.

> or actually...
>
>    Yes, and let's move on to deeper problems (if any).

What deeper problems? :-)

Seriously: I saw this as the biggest problem in the scope of this
optimization. If the result is not good the entire optimization would
be worthless, so any other problem with my implementation would be
irrelevant.

Good to hear that it's not that big of a problem.

(I realize this optimization is just a small thing compared to more
important/fundamental stuff going on right now in subversion. But it's
important to me personally. And it's the most I can do with my
abilities/knowledge/time right now, so ...)

> (I'm not /particularly/ worried about a "different-than-expected, but
> still semantically correct, diff" --- though, yes, it's nice to output
> the "expected" diff if that's possible without too much effort and
> without breaking other use cases.)
>
> :-),
>
> Daniel
> (p.s. I don't have as much time to look into this as I'd like to...)

No problem. Your feedback is very much appreciated.

On Fri, Oct 8, 2010 at 8:46 PM, Hyrum K. Wright
<hyrum_wright_at_mail.utexas.edu> wrote:
> My not-having-looked-deeply-at-the-problem reaction is this: if we can
> introduce significant speed-ups in the common case, while still
> maintaining functionality, let's do it. There may be corner cases
> where somebody gets an ugly diff, but in those corner cases there are
> likely to be *other* issues as play as well. Let's not penalize
> everybody for the sake of a couple of corner cases.

Ok, perfect, we seem to have a consensus then :-). I'll cook up a
version of the patch which keeps 50 suffix lines.

Thanks for the feedback, Daniel and Hyrum.

Just one more thing: as I mentioned in my rather long mail, blame
would benefit the most from my optimization if the server were fast
enough. So if anyone could spare some review cycles (ok, I know they
are scarce these days), reviewing Stefan^2's integrate-cache-membuffer
branch would really help my cause as well ...

Cheers,

-- 
Johan
Received on 2010-10-08 22:24:39 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.