Hi!
The Google Summer of Code period has started. Posting about what I'm
currently working on as part of implementing the 'git unidiff' format
for wc-wc diffs. Perhaps someone sees an obvious solution to any of the
three problems I present below.
The callchain:
===============
do_diff()
svn_wc_diff_wc6()
/* Calls svn_wc__db_read_children() and recurses into child dirs but
* calls file_diff() on files. */
directory_elements_diff()
file_diff()
/* Will be called for copies and moves add-part. */
eb->file_changed()
/* If called with '--show-copies-as-adds' the add-part will be
* reported with this func. */
eb->file_added()
/* Will eventually be called for both added, changed and
* deleted paths. From here we call into the diff library to
* create headers and actual diff content. */
diff_content_changed()
Problem 1
==============
We're processing the delete-half instead of the add-half for a
copy/move. At present, copy/move add-half is not shown by 'svn diff'
unless invoked with '--show-copies-as-adds'. But with the 'git unidiff'
format we would only show a diff for the add-part, e.g. we would change
the behaviour from reporting the delete-half to reporting the add-half.
Suggestion on solutions:
1) Call file_added() for all paths with db_status copied/moved
instead of using file_changed().
+ Allows us to easily detect what files are adds and the
svn_wc_diff_callback already has a copyfrom arg.
- We still have to distinguish between copied and moved paths in
the callback.
2) Introduce file_copied() and file_moved() callbacks. Those could
have a 'delete_half' arg or similar, e.g. we would pass both the
add-half and del-half to that function and separate them with an
parameter.
+ Might enhance readability.
- We're introducing a concept that may be viewed as inconsistent
with the rest of the code. svn handles 'add+del', is it
acceptable to introduce copy and move at this level of the code?
Further, the diff_file_added() and diff_file_deleted() does very
little as is, wouldn't it be better to introduce some extra
complexity there instead?
3) ...
Problem 2
=============
We need to process the add-half to know the delete-half. The add-half
knows about the delete-half through the copyfrom information, but the
other way around does not work - there's no 'copyto' information. Since
we don't want to present a diff for the delete-half, we need to be able
to know that a deleted file actually is part of a delete. Suggestion on
solutions:
1) Create a custom svn_wc__db_read_children() func that can has an out
parameter callled 'copyto', e.g. the func would traverse the entire
wc looking if there is any other paths that has the current path as
copyfrom.
+ We can make the detecting operation less expensive than a file
traversal since we have SQL and the db file will already be in
memory. (Will involve more reads before we move to one db).
- It feels a bit awkvard to fetch 'copyto'. There must have been a
design decision to only only make the copyfrom relation
unidirected. How does other code parts detect the copyto?
2) Store added and deleted paths as keys in two hashs with the diff
contents inside diff_content_changed().
+ We don't have to change directory_elements_diff().
- The memory consumption would grow with the number of deleted and
added paths. If we choose to save them to tmp-files, we could
avoid that but would suffer in performance instead.
3) ...
Problem 3
==========
If we do changes in the callchain presented earlier, we would affect
the 'url->wc' use case too. directory_elements_diff() is called from
close_directory() and close_edit(). We would have git diffs for -r
BASE:WORKING but no git diff headers for the revisions preceeding BASE.
I'm assuming that it's not a problem since the current 'svn diff'
command shows copies for the changes in the wc but not the ones before.
Cheers,
Daniel
Received on 2010-05-29 14:51:24 CEST