[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: diff --summarize

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Wed, 24 Aug 2011 16:38:28 +0100

Neels J Hofmeyr wrote:
> I've examined your patch and did add something to wc_wc_diff_summarize,
> though it's not as plain as I first thought.
>
> I'll give you the elephant in the room first:
>
> Currently, 'svn diff --summarize <wc-path>' does *nothing else* than what a
> plain 'svn status' does. There is no way to get other wc-wc-diffs than the
> diff between BASE and WORKING/ACTUAL. So at this point in time, we could
> redirect diff_wc_wc_summarize to client_status and basically no-one would
> notice anything.

Yes, that's true: it provides little or no added value over 'status'.
It's also inconsistent with 'status' in minor ways, e.g. it prints 'A'
and 'D' for a replacement whereas 'status' prints 'R'. I noticed that,
and briefly wondered if there could be a way to combine the 'diff
--summarize' and 'status' code paths, but that's not the way to go.

The alternative and attractive option is (one day) to extend 'diff' to
be able to diff two arbitrary trees (in the repos and/or WC and/or
unversioned), and then 'diff --summarize' automatically gets that
ability too.

> This whole local --summarize only even *begins* to make sense when we can
> compare one subpath of a WC to another subpath, or even one checkout with a
> different checkout (i.e. locally compare between completely different WCs).

That stuff is lovely, I'd really like to see it. In some ways the local
summarize implementation restricted to a BASE-WORKING diff is totally
pointless because it just does what 'status' can do, but from an
architectural POV it is really important.

Even from a practical POV, the orthogonality is useful. I have a script
that generates a patch file, basically by running "svn diff", but not
quite. I want the files to appear in alphabetical order of their paths,
and I want to output a list of their paths. So my script runs "svn
status" to find out what paths will appear in the diff, and then runs
"svn diff" on one at a time. If it can run "diff --summarize" instead
of "status", then it can be trivially extended to work on repos paths,
where at the moment it only supports WC paths.

> Now back to pretending diff --summarize <wc-path> *is* in fact useful in
> itself:
>
> So I thought, during wc-wc diff, it would make sense to also have a
> text_deltas flag, so we don't need to do the keywords/nl translations. So I
> coded that. While I did, I hit this comment:
>
> /* Note that this might be the _second_ time we translate
> the file, as svn_wc__internal_file_modified_p() might have used a
> tmp translated copy too. But what the heck, diff is
> already expensive, translating twice for the sake of code
> modularity is liveable. */
>
> So there's this thing where diff translates pristine&working *anyway* to
> determine if they are different. That's just the check whether to skip the file.
>
> Then it translates them *again* to do the diff. And I think the "is it
> modified?" test uses streamy translation and the diff-producing part uses
> translation to a tempfile.
>
> So there's room for optimization there. Translate them only once, for
> comparison, if that's needed, and keep the result for producing the diff.

Ugh. Good catch. Room for improvement there, for sure.

> So all my part of the patch does is skip the second translation if
> text_deltas is false. The first translation can't generally be skipped since
> it determines the text-modified status. It has its own skip semantics.
>
> Looking at svn_wc__internal_file_modified_p()'s (the first translation's)
> heuristics to shortcut the decision whether a file was modified:
> - shortcut to "yes, modified" when
> - there is no pristine
> - the WC status gives away a modification
> - shortcut to "no, not modified" when
> - filesize and timestamp unchanged
> - else translate and compare
>
> So if it shortcuts to "yes, modified", the later diff-translation is the
> only translation. (1)
> If it shortcuts to "no, not modified", the diff-translation is always
> skipped as there is no diff.
> Else, basically if there are only text mods (which can only be verified
> after translation) there are always two translations. (2)
>
> So this patch makes local-diff translation-free in (1), and removes one of
> the two translations, i.e. removes the tempfile-based diff-producing
> translation, in (2).
>
> Considering that actually, in most default cases (with the lack of
> keywords/nl props), there is *no* translation at all and the callbacks get
> the direct pristine and working copy paths, my patch *is* a bit overkill.
>
> But when a user is coming from the other side, say if translating their
> files takes a really really long time, well, this might be a noticeable
> speedup seen on --summarize. Same of which is obtained by running 'status'.
>
> ...If I haven't confused you too much by now, do you think it's worth
> touching wc_wc diff like my patch does?

OK, I think this optimization is certainly worth considering. I haven't
looked at your patch for it yet.

> If you said yes, you do probably also think that we shall support diffing
> separate wc-paths at some point?
>
> [[[
> $ svn diff --summarize --old=wc/foo/ --new=wc/bar/
> svn: E200004: Sorry, svn_client_diff5 was called in a way that is not yet
> supported
> svn: E200004: Only diffs between a path's text-base and its working files
> are supported at this time
> ]]]

Yes, I want us to support that.

> (The attached file is your patch and my questionable one combined. And I did
> add two printf()s next to DBG()s to show the file_mod paths (whether and how
> they're empty), sorry about that)

I'll take a look.

- Julian

> On 08/22/2011 02:21 PM, Julian Foad wrote:
> > Hi Neels.
> >
> > The new diff --summarize code presently still downloads the full text of
> > each deleted file to supply to the diff callbacks. I have a patch in
> > progress (attached) to eliminate this, but I shan't be working on it for
> > a few days. With this patch, 'diff --summarize' still works but I
> > haven't checked carefully (either by testing or analysis) that this is
> > actually eliminating all the unnecessary downloads, and it could do with
> > some documentation updates in the diff callbacks to say that this usage
> > is allowed.
> >
> > Feel free to progress this if you wish, otherwise I'll get back to it
> > later.
> >
> > - Julian
> >
Received on 2011-08-24 17:39:05 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.