[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Ideas for handling copies/moves in 'svn diff'

From: Daniel Näslund <daniel_at_longitudo.com>
Date: Tue, 1 Jun 2010 22:04:02 +0200

On Sun, May 30, 2010 at 11:16:55AM +0200, Stefan Sperling wrote:
> On Sat, May 29, 2010 at 02:34:06PM +0200, Daniel Näslund wrote:
> > Problem 1
> > ==============
> > We're processing the delete-half instead of the add-half for a
> > copy/move. At present, copy/move add-half is not shown by 'svn diff'
> > unless invoked with '--show-copies-as-adds'. But with the 'git unidiff'
> > format we would only show a diff for the add-part, e.g. we would change
> > the behaviour from reporting the delete-half to reporting the add-half.
> Why do we need to change the behaviour?

My main concern was that we didn't use file_added() for all the cases
involving copies and moves. That was unfortunately far from obvious from
my last mail. :(

> Right now, svn diff always shows deleted paths as deleted in the diff,
> unless you pass --no-diff-deleted. svn patch already knows how to deal
> with those files (delete them).
> For copies, we don't show the add unless you pass --show-copies-as-adds.
> If the copy has local mods, you see a diff between the copy source and
> the modified copy. With --show-copies-as-adds, you see every copy as added.
> Note that --show-copies-as-adds was added so that svn patch can see
> copied files (without knowing that they are copies, it treats them
> just like any other added file). Once svn patch understand copy/move
> information in the diff header this option will be needed much less.
> I think it's fine for now to make copy/move info in the diff header behave
> according to the above:
> - Print just a copy header but no content diff for copied without mods:
> Index: somefile
> ===================================================================
> copy from <copyfrom path>
> copy to <somefile>
> Index: nextfile
> - Print a diff as well if the file was modified after copying:
> Index: somefile
> ===================================================================
> copy from <copyfrom path>
> copy to <somefile>
> --- [...]
> +++ [...]
> @@ [...] @@
> [...]
> (I've used [...] to denote an omission).
> - With --show-copies-as-adds, print copy/move info and also show
> the copy as a diff that only adds lines:
> Index: somefile
> ===================================================================
> copy from <copyfrom path>
> copy to <somefile>
> --- [...]
> +++ [...]
> @@ [...] @@
> + [...]
> + [...]
> + [...]

Ok, we keep the current behaviour but add 'git diff' headers.

> I think you can handle all the above cases in the file_added() callback
> in the diff editor.

Nah, I will need some way to determine if a file was modified after
copying. I probably will introduce a copy_from field to file_changed().

> > Suggestion on solutions:
> >
> > 1) Call file_added() for all paths with db_status copied/moved
> > instead of using file_changed().
> > + Allows us to easily detect what files are adds and the
> > svn_wc_diff_callback already has a copyfrom arg.
> > - We still have to distinguish between copied and moved paths in
> > the callback.
> We could chose to only show copies in the git headers for now.
> And defer moves till wc-ng can handle them properly.
> svn patch will need to make sure to copy files before deleting the source.
> This will probably require a two-pass approach when patching...
> But again, once wc-ng can tell us whether a file was moved or copied,
> this will get much easier.

_scan_addition() can give us the copyfrom relpath and tell us if the path
was added or copied. 'svn mv' is currently implemented using a lock and
doing a svn_wc_copy3() and then svn_wc_delete4(). Thus, we can detect
copies/moves and we can perform them using wc-ng mechanisms. Or am I
missing something?

> > 2) Introduce file_copied() and file_moved() callbacks. Those could
> > have a 'delete_half' arg or similar, e.g. we would pass both the
> > add-half and del-half to that function and separate them with an
> > parameter.
> > + Might enhance readability.
> > - We're introducing a concept that may be viewed as inconsistent
> > with the rest of the code. svn handles 'add+del', is it
> > acceptable to introduce copy and move at this level of the code?
> > Further, the diff_file_added() and diff_file_deleted() does very
> > little as is, wouldn't it be better to introduce some extra
> > complexity there instead?
> Aren't you changing public diff-editor API by adding those functions?
> If so, I don't think it's a huge problem to rev the API. But if we
> can avoid doing so I'd prefer not to.

A sidetrack: I thought that all editors must implement all
svn_delta_editor_t funcs, but svn_wc_diff_callbacks4_t only does it for
a subset, i.e. there's no open_root() or close_edit(). We're calling it
a diff editor but is it really equal to something like the status
editor? To me, it feels like a bunch of callbacks created in the same
spirit as the delta_editor callbacks but not strictly following it's

I agree with you. I prefer to not change the API more than neccessary.
On the other hand it has already been revved to 1.7 so some small
modifications, e.g. possibly adding copyfrom to file_changed() won't be
too troublesome.

> > Problem 2
> > =============
> > We need to process the add-half to know the delete-half. The add-half
> > knows about the delete-half through the copyfrom information, but the
> > other way around does not work - there's no 'copyto' information. Since
> > we don't want to present a diff for the delete-half, we need to be able
> > to know that a deleted file actually is part of a delete. Suggestion on
> > solutions:
> >
> > 1) Create a custom svn_wc__db_read_children() func that can has an out
> > parameter callled 'copyto', e.g. the func would traverse the entire
> > wc looking if there is any other paths that has the current path as
> > copyfrom.
> > + We can make the detecting operation less expensive than a file
> > traversal since we have SQL and the db file will already be in
> > memory. (Will involve more reads before we move to one db).
> > - It feels a bit awkvard to fetch 'copyto'. There must have been a
> > design decision to only only make the copyfrom relation
> > unidirected. How does other code parts detect the copyto?
> In general, there is no copy-to information in svn at all at the moment.
> wc-ng will eventually provide this information. The DB schema supports
> copied-from/copied-to and moved-from/moved-to.
> But I don't think we record anything but copied-from yet.
> So I'd just wait until wc-ng records the extra information we need,
> and then generate extra information in the diff.

I grepped libsvn_wc without finding any references to copyto. Did you
mean that the DB schema supports adding those fields in the future? In
that case, it will be straightforward to detect deletes that is part of
a move. Will wc-ng 1.7 support copyto information?

> > 2) Store added and deleted paths as keys in two hashs with the diff
> > contents inside diff_content_changed().
> > + We don't have to change directory_elements_diff().
> > - The memory consumption would grow with the number of deleted and
> > added paths. If we choose to save them to tmp-files, we could
> > avoid that but would suffer in performance instead.
> We store deleted paths in batons in quite a few places. E.g. the merge
> code does this to tell apart path replacements from additions.
> If a similar scheme works here I'd say let's go for it. But again,
> waiting until wc-ng matures enough is probably a better idea.

I'll bid my time.

What I can do at the moment
1) Create a make_git_header() func inside #ifdef SVN_PATCH_EXPERIMENTAL
2) Create a test_git_header() func inside #ifdef SVN_PATCH_EXPERIMENTAL
3) Clean up the code paths for file_added() and file_changed().
4) While deciding on the best way of detecting copies/moves in the diff
  code, I can start on the parser in parse-diff.c and write C-tests for
  it. Will be using #ifdef's here too.

Thanks for your input,
Received on 2010-06-01 22:04:47 CEST

This is an archived mail posted to the Subversion Dev mailing list.