[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: diff --summarize

From: Neels J Hofmeyr <neels_at_elego.de>
Date: Wed, 24 Aug 2011 01:32:15 +0200

Hi Julian,

I've examined your patch and did add something to wc_wc_diff_summarize,
though it's not as plain as I first thought.

I'll give you the elephant in the room first:

Currently, 'svn diff --summarize <wc-path>' does *nothing else* than what a
plain 'svn status' does. There is no way to get other wc-wc-diffs than the
diff between BASE and WORKING/ACTUAL. So at this point in time, we could
redirect diff_wc_wc_summarize to client_status and basically no-one would
notice anything.

This whole local --summarize only even *begins* to make sense when we can
compare one subpath of a WC to another subpath, or even one checkout with a
different checkout (i.e. locally compare between completely different WCs).

Now back to pretending diff --summarize <wc-path> *is* in fact useful in
itself:

So I thought, during wc-wc diff, it would make sense to also have a
text_deltas flag, so we don't need to do the keywords/nl translations. So I
coded that. While I did, I hit this comment:

    /* Note that this might be the _second_ time we translate
       the file, as svn_wc__internal_file_modified_p() might have used a
       tmp translated copy too. But what the heck, diff is
       already expensive, translating twice for the sake of code
       modularity is liveable. */

So there's this thing where diff translates pristine&working *anyway* to
determine if they are different. That's just the check whether to skip the file.

Then it translates them *again* to do the diff. And I think the "is it
modified?" test uses streamy translation and the diff-producing part uses
translation to a tempfile.

So there's room for optimization there. Translate them only once, for
comparison, if that's needed, and keep the result for producing the diff.

So all my part of the patch does is skip the second translation if
text_deltas is false. The first translation can't generally be skipped since
it determines the text-modified status. It has its own skip semantics.

Looking at svn_wc__internal_file_modified_p()'s (the first translation's)
heuristics to shortcut the decision whether a file was modified:
- shortcut to "yes, modified" when
   - there is no pristine
   - the WC status gives away a modification
- shortcut to "no, not modified" when
   - filesize and timestamp unchanged
- else translate and compare

So if it shortcuts to "yes, modified", the later diff-translation is the
only translation. (1)
If it shortcuts to "no, not modified", the diff-translation is always
skipped as there is no diff.
Else, basically if there are only text mods (which can only be verified
after translation) there are always two translations. (2)

So this patch makes local-diff translation-free in (1), and removes one of
the two translations, i.e. removes the tempfile-based diff-producing
translation, in (2).

Considering that actually, in most default cases (with the lack of
keywords/nl props), there is *no* translation at all and the callbacks get
the direct pristine and working copy paths, my patch *is* a bit overkill.

But when a user is coming from the other side, say if translating their
files takes a really really long time, well, this might be a noticeable
speedup seen on --summarize. Same of which is obtained by running 'status'.

...If I haven't confused you too much by now, do you think it's worth
touching wc_wc diff like my patch does?

If you said yes, you do probably also think that we shall support diffing
separate wc-paths at some point?

[[[
$ svn diff --summarize --old=wc/foo/ --new=wc/bar/
svn: E200004: Sorry, svn_client_diff5 was called in a way that is not yet
supported
svn: E200004: Only diffs between a path's text-base and its working files
are supported at this time
]]]

(The attached file is your patch and my questionable one combined. And I did
add two printf()s next to DBG()s to show the file_mod paths (whether and how
they're empty), sorry about that)

~Neels

On 08/22/2011 02:21 PM, Julian Foad wrote:
> Hi Neels.
>
> The new diff --summarize code presently still downloads the full text of
> each deleted file to supply to the diff callbacks. I have a patch in
> progress (attached) to eliminate this, but I shan't be working on it for
> a few days. With this patch, 'diff --summarize' still works but I
> haven't checked carefully (either by testing or analysis) that this is
> actually eliminating all the unnecessary downloads, and it could do with
> some documentation updates in the diff callbacks to say that this usage
> is allowed.
>
> Feel free to progress this if you wish, otherwise I'll get back to it
> later.
>
> - Julian
>

Received on 2011-08-24 01:32:50 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.