[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Ideas for handling copies/moves in 'svn diff'

From: Daniel Näslund <daniel_at_longitudo.com>
Date: Sat, 29 May 2010 14:34:06 +0200

Hi!

The Google Summer of Code period has started. Posting about what I'm
currently working on as part of implementing the 'git unidiff' format
for wc-wc diffs. Perhaps someone sees an obvious solution to any of the
three problems I present below.

The callchain:
===============
do_diff()
  svn_wc_diff_wc6()
    /* Calls svn_wc__db_read_children() and recurses into child dirs but
     * calls file_diff() on files. */
    directory_elements_diff()
      file_diff()
        /* Will be called for copies and moves add-part. */
        eb->file_changed()
        /* If called with '--show-copies-as-adds' the add-part will be
         * reported with this func. */
        eb->file_added()
          /* Will eventually be called for both added, changed and
           * deleted paths. From here we call into the diff library to
           * create headers and actual diff content. */
          diff_content_changed()

Problem 1
==============
We're processing the delete-half instead of the add-half for a
copy/move. At present, copy/move add-half is not shown by 'svn diff'
unless invoked with '--show-copies-as-adds'. But with the 'git unidiff'
format we would only show a diff for the add-part, e.g. we would change
the behaviour from reporting the delete-half to reporting the add-half.
Suggestion on solutions:

1) Call file_added() for all paths with db_status copied/moved
      instead of using file_changed().
      + Allows us to easily detect what files are adds and the
        svn_wc_diff_callback already has a copyfrom arg.
      - We still have to distinguish between copied and moved paths in
        the callback.
2) Introduce file_copied() and file_moved() callbacks. Those could
      have a 'delete_half' arg or similar, e.g. we would pass both the
      add-half and del-half to that function and separate them with an
      parameter.
      + Might enhance readability.
      - We're introducing a concept that may be viewed as inconsistent
        with the rest of the code. svn handles 'add+del', is it
        acceptable to introduce copy and move at this level of the code?
        Further, the diff_file_added() and diff_file_deleted() does very
        little as is, wouldn't it be better to introduce some extra
        complexity there instead?
3) ...

Problem 2
=============
We need to process the add-half to know the delete-half. The add-half
knows about the delete-half through the copyfrom information, but the
other way around does not work - there's no 'copyto' information. Since
we don't want to present a diff for the delete-half, we need to be able
to know that a deleted file actually is part of a delete. Suggestion on
solutions:

1) Create a custom svn_wc__db_read_children() func that can has an out
    parameter callled 'copyto', e.g. the func would traverse the entire
    wc looking if there is any other paths that has the current path as
    copyfrom.
    + We can make the detecting operation less expensive than a file
      traversal since we have SQL and the db file will already be in
      memory. (Will involve more reads before we move to one db).
    - It feels a bit awkvard to fetch 'copyto'. There must have been a
      design decision to only only make the copyfrom relation
      unidirected. How does other code parts detect the copyto?

2) Store added and deleted paths as keys in two hashs with the diff
    contents inside diff_content_changed().
    + We don't have to change directory_elements_diff().
    - The memory consumption would grow with the number of deleted and
      added paths. If we choose to save them to tmp-files, we could
      avoid that but would suffer in performance instead.
3) ...

Problem 3
==========
If we do changes in the callchain presented earlier, we would affect
the 'url->wc' use case too. directory_elements_diff() is called from
close_directory() and close_edit(). We would have git diffs for -r
BASE:WORKING but no git diff headers for the revisions preceeding BASE.
I'm assuming that it's not a problem since the current 'svn diff'
command shows copies for the changes in the wc but not the ones before.

Cheers,
Daniel
Received on 2010-05-29 14:51:24 CEST

This is an archived mail posted to the Subversion Dev mailing list.