[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

RFC: 'svn diff' and renamed items (issue #2543)

From: Stefan Sperling <stsp_at_elego.de>
Date: Fri, 2 Oct 2009 15:56:09 +0100

I am looking into issue #2543, which is about making 'svn diff'
show the full content of files after the file was renamed.

I though I'd share my current progress and ideas to get my approach
checked for sanity and gather any feedback you might have.

Below, I'll describe what svn diff did before my changes, what I would
like it to do instead, what I have done so far, and what problems I
am now facing.

== What svn originally did ==

When renaming a file locally in the WC, svn diff would show the
file deleted at the old path in its entirety, but not show it
at the new path:

  $ svn move alpha alpha2
  A alpha2
  D alpha
  $ svn diff
  Index: alpha
  ===================================================================
  --- alpha (revision 2)
  +++ alpha (working copy)
  @@ -1 +0,0 @@
  -alpha
  $

Only if the renamed file was also modified, it would appear in the diff
output, showing how it was modified relative to the move source:

  $ echo aaa >> alpha2
  $ svn diff
  Index: alpha
  ===================================================================
  --- alpha (revision 2)
  +++ alpha (working copy)
  @@ -1 +0,0 @@
  -alpha
  Index: alpha2
  ===================================================================
  --- alpha2 (revision 2)
  +++ alpha2 (working copy)
  @@ -1 +1,2 @@
  alpha
  +aaa
  $

The behaviour when diffing entire directories was similar, i.e. only
files which were modified in addition to being moved (or copied) with
their parent directory were shown in the diff output.

== What I'd like svn diff to do instead ==

There is a good reason for the above behaviour. It allows users to see
which local modifications they have made to a file after moving or
copying it. But this is not always what the user wants to know.

For example, when creating a patch which is supposed to be applied back
with UNIX patch (or svn patch, more on that later), it is impossible to
tell svn diff to include moved or copied files in their entirety. This
means that 'svn diff' is absolutely useless for creating patches which
add copied or renamed files or directories.

It should be possible to get either behaviour from 'svn diff'.
One of these behaviours will be the default, the other will be toggled
by a command line flag.

Note that files which were newly added without history are already shown
in their entirety, so there's no problem with added files or directories.

== What I've done so far ==

There is a flag which has a suitable name, --notice-ancestry.
It is documented in the book as follows:

  By default, svn diff ignores the ancestry of files and merely compares the
  contents of the two files being compared. If you use --notice-ancestry,
  the ancestry of the paths in question will be taken into consideration when
  comparing revisions (i.e., if you run svn diff on two files with identical
  contents but different ancestry, you will see the entire contents of the
  file as having been removed and added again).

I decided to generalise the meaning of the --notice-ancestry flag (this may
have been a wrong decision), and modified svn diff on trunk so that it now
displays copied files just like newly added files by default:

  $ svn diff
  Index: alpha
  ===================================================================
  --- alpha (revision 2)
  +++ alpha (working copy)
  @@ -1 +0,0 @@
  -alpha
  Index: alpha2
  ===================================================================
  --- alpha2 (revision 0)
  +++ alpha2 (revision 2)
  @@ -0,0 +1,2 @@
  +alpha
  +aaa

And the old behaviour is available with --notice-ancestry:

$ svn diff --notice-ancestry
  Index: alpha
  ===================================================================
  --- alpha (revision 2)
  +++ alpha (working copy)
  @@ -1 +0,0 @@
  -alpha
  Index: alpha2
  ===================================================================
  --- alpha2 (revision 2)
  +++ alpha2 (working copy)
  @@ -1 +1,2 @@
   alpha
  +aaa

This was very easy to implement, since all it took was to add a
check for the --notice-ancestry flag at the point where we tweak
a copied file's schedule from 'add' to 'normal' to force comparison
to its text base, which we end up finding via copyfrom information.
See r39680.

I also got this to work when comparing the working copy to the repository:

  $ svn diff -r2 alpha2
  Index: alpha2
  ===================================================================
  --- alpha2 (revision 0)
  +++ alpha2 (revision 2)
  @@ -0,0 +1,2 @@
  +alpha
  +aaa
  $ svn diff --notice-ancestry -r1 alpha2
  Index: alpha2
  ===================================================================
  --- alpha2 (.../alpha) (revision 1)
  +++ alpha2 (.../alpha2) (working copy)
  @@ -1 +1,2 @@
   alpha
  +aaa

The change for this repos->wc diff case was a little more involved.
What I did was to make resolving of the pegged URL diff target via
svn_client__repos_locations() conditional on the --notice-ancestry option,
and passing entry->URL of the copied file unmodified if the flag is not set.
Since the file was locally copied and does (usually ;) not yet exist in the
repository, the resulting diff only shows added lines. See r39720.

Diffing with future revisions works as expected.
In the case where the file does exist in the left-diff revision,
the user is shown a diff between the locally copied file and the
file that exists in the repository.
For example, in a WC based on r2, we copied 'alpha' to 'foo',
and someone else already added a new file 'foo' added in r3:

  $ svn diff foo
  Index: foo
  ===================================================================
  --- foo (revision 0)
  +++ foo (revision 2)
  @@ -0,0 +1 @@
  +alpha
  $ svn diff -r2 foo # foo does not exist in r2
  Index: foo
  ===================================================================
  --- foo (revision 0)
  +++ foo (revision 2)
  @@ -0,0 +1 @@
  +alpha
  $ svn diff -r3 foo # foo does exist in r3, however
  Index: foo
  ===================================================================
  --- foo (revision 3)
  +++ foo (working copy)
  @@ -1 +1 @@
  -foo
  +alpha

What does not work yet is 'svn diff -rX URL'.
While a WC path triggers the desired behaviour...

  $ svn diff -r1 alpha2
  Index: alpha2
  ===================================================================
  --- alpha2 (revision 0)
  +++ alpha2 (revision 3)
  @@ -0,0 +1,2 @@
  +alpha
  +aaa

... a URL does not:

  $ svn diff -r1 ^/trunk/alpha2
  Index: alpha
  ===================================================================
  --- alpha (.../alpha) (revision 1)
  +++ alpha (.../alpha2) (revision 3)
  @@ -1 +1,2 @@
   alpha
  +aaa

Also, so far my changes work for files, but not for directories.
I see no reason why this should not work for directories,
but I have not yet looked into fixing that.

== Problems ==

Assuming it can be made to work with directories as well as files,
the current solution makes the impossible use case possible.
But I can see a couple of problems with it already, and there may be
even more problems which I am not seeing yet.

1) It is possible that diff targets sitting in the repository cannot be
   found with the new behaviour.
   So if we enable it by default, we enable such problems by default.

   E.g. diff_tests.py 33 has this problem, it errors out during a simple
   'svn diff -r1' unless --notice-ancestry is passed:
     svn: Try using the --notice-ancestry option
     svn: Target path '/A_prime' does not exist

   It's easy to reproduce the problem:
    $ svn mkdir dir
    A dir
    $ svn ci -m"add dir"
    Adding dir
    
    Committed revision 3.
    $ cd dir
    $ svn diff -r2
    svn: Try using the --notice-ancestry option
    svn: Target path '/trunk/dir' does not exist

   This gets worse, e.g. svn does not complain if we do:
    $ cd ..
    $ svn diff -r2 dir
    $
   Whoops...

   Also, it looks like there is still a bug related to peg revision parsing:
    $ cd dir
    $ svn diff -r2 ._at_3
    svn: '@3' is just a peg revision. Maybe try '@3@' instead?

2) When we add support for git extensions, the default will have to
   change again, since copies are represented as:

     diff --git a/{oldname} b/{newname}
     copy from {oldname}
     copy to {newname}
     {normal unidiff data here if there was a difference, only for how new
     is different from old}

   Which means that as soon as we support git extensions, any intelligent
   patching tool (and most importantly svn patch) can do the right thing
   with just the delta between copy or move source and target. Changing the
   default now just to change it again later is certainly not ideal.

   Still, we will need new behaviour anyway to facilitate creation of
   patches which less intelligent patching tools (such as UNIX patch)
   can deal with. So I want to keep it, just not make it the default.
   This implies adding a new command line flag to 'svn diff', because
   --notice-ancestry cannot be used for this purpose. Any ideas for a
   suitable name? --full-copies? --add-copies? --ignore-copyfrom?
   --treat-copies-as-added-without-history?

3) I'd like this new 'svn diff' command line flag to behave consistently
   across all possible use cases for 'svn diff'.

   When doing pure repos->repos diffs, we currently always expect the
   files being diffed to exist at the specified peg revisions (defaulting
   to OLDREV and NEWREV with the second synopsis). If they don't exist
   we currently error out. This would need to change.

   If we allow either of the targets to be non-existent, it is possible
   to diff any item in the repository against non-existent paths in the
   repository, producing a diff that fully adds (or deletes) the non-missing
   diff target. I have a local patch (see below) which results in the
   following behaviour:

    $ svn diff --old=^/does_not_exist --new=^/trunk/beta
    Index: does_not_exist
    ===================================================================
    --- does_not_exist (.../does_not_exist) (revision 0)
    +++ does_not_exist (.../trunk/beta) (revision 3)
    @@ -0,0 +1 @@
    +beta
    $ svn diff --old=^/trunk/beta --new=^/does_not_exist
    Index: beta
    ===================================================================
    --- beta (.../trunk/beta) (revision 3)
    +++ beta (.../does_not_exist) (revision 3)
    @@ -1 +0,0 @@
    -beta
    $ svn diff --notice-ancestry --old=^/trunk/beta \
        --new=^/trunk/does_not_exist
    svn: 'file:///tmp/svn-sandbox/repos/does_not_exist' was not found \
    in the repository at revision 3
  
   (Note how the diff uses the *old* name in the +++ diff header line,
    meaning UNIX patch will create or delete a file with the wrong name.
    This also happens with valid URLs and is probably a bug we need to fix.)
   
   Since a simple typo can potentially lead to huge diff output with the
   new behaviour, this is another reason why it should not be the default.

Am I on the right track?

Thanks,
Stefan

Index: subversion/libsvn_client/diff.c
===================================================================
--- subversion/libsvn_client/diff.c (revision 39749)
+++ subversion/libsvn_client/diff.c (working copy)
@@ -1094,7 +1094,7 @@ diff_prepare_repos_repos(const struct diff_paramet
            (params->path2 == drr->url2) ? NULL : params_path2_abspath,
            ra_session, params->revision2, pool));
   SVN_ERR(svn_ra_check_path(ra_session, "", drr->rev2, &kind2, pool));
- if (kind2 == svn_node_none)
+ if (! params->ignore_ancestry && kind2 == svn_node_none)
     return svn_error_createf
       (SVN_ERR_FS_NOT_FOUND, NULL,
        _("'%s' was not found in the repository at revision %ld"),
@@ -1106,7 +1106,7 @@ diff_prepare_repos_repos(const struct diff_paramet
            (params->path1 == drr->url1) ? NULL : params_path1_abspath,
            ra_session, params->revision1, pool));
   SVN_ERR(svn_ra_check_path(ra_session, "", drr->rev1, &kind1, pool));
- if (kind1 == svn_node_none)
+ if (! params->ignore_ancestry && kind1 == svn_node_none)
     return svn_error_createf
       (SVN_ERR_FS_NOT_FOUND, NULL,
        _("'%s' was not found in the repository at revision %ld"),

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2402958
Received on 2009-10-02 16:56:29 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.