Ok, third iteration of the patch in attachment. It passes make check.
As discussed in [1], this version keeps 50 lines of the identical
suffix around, to give the algorithm a good chance to generate a diff
output of good quality (in all but the most extreme cases, this will
be the same as with the original svn_diff algorithm).
That's about the only difference with the previous iteration. So for
now, I'm submitting this for review. Any feedback is very welcome :-).
I still consider this a WIP, because of the following remaining todo's
(which may have a lot of impact on the current implementation):
- Generalize for more than 2 datasources (for diff3 and diff4).
- revv svn_diff_fns_t and maybe other stuff I've changed in public API.
- Add support for -x-b, -x-w, and -x--ignore-eol-style options. Maybe
switch the implementation to read out entire lines before comparing
(like datasources_get_next_token does).
Log message:
[[[
Make svn_diff_diff skip identical prefix and suffix to make diff and blame
faster.
* subversion/include/svn_diff.h
(svn_diff_fns_t): Added new function types datasources_open and
get_prefix_lines to the vtable.
* subversion/libsvn_diff/diff_memory.c
(datasources_open): New function (does nothing).
(get_prefix_lines): New function (does nothing).
(svn_diff__mem_vtable): Added new functions datasources_open and
get_prefix_lines.
* subversion/libsvn_diff/diff_file.c
(svn_diff__file_baton_t): Added members prefix_lines, suffix_start_chunk[4]
and suffix_offset_in_chunk.
(increment_pointer_or_chunk, decrement_pointer_or_chunk): New functions.
(find_identical_prefix, find_identical_suffix): New functions.
(datasources_open): New function, to open both datasources and find their
identical prefix and suffix. From the identical suffix, 50 lines are kept to
help the diff algorithm find the nicest possible diff representation
in case of ambiguity.
(get_prefix_lines): New function.
(datasource_get_next_token): Stop at start of identical suffix.
(svn_diff__file_vtable): Added new functions datasources_open and
get_prefix_lines.
* subversion/libsvn_diff/diff.h
(svn_diff__get_tokens): Added argument "datasource_opened", to indicate that
the datasource was already opened.
* subversion/libsvn_diff/token.c
(svn_diff__get_tokens): Added argument "datasource_opened". Only open the
datasource if datasource_opened is FALSE. Set the starting offset of the
position list to the number of prefix lines.
* subversion/libsvn_diff/lcs.c
(svn_diff__lcs): Added argument "prefix_lines". Use this to correctly set
the offset of the sentinel position for EOF, even if one of the files
became empty after eliminating the identical prefix.
* subversion/libsvn_diff/diff.c
(svn_diff__diff): Add a chunk of "common" diff for identical prefix.
(svn_diff_diff): Use new function datasources_open, to open original and
modified at once, and find their identical prefix and suffix. Pass
prefix_lines to svn_diff__lcs and to svn_diff__diff.
* subversion/libsvn_diff/diff3.c
(svn_diff_diff3): Pass datasource_opened = FALSE to svn_diff__get_tokens.
Pass prefix_lines = 0 to svn_diff__lcs.
* subversion/libsvn_diff/diff4.c
(svn_diff_diff4): Pass datasource_opened = FALSE to svn_diff__get_tokens.
Pass prefix_lines = 0 to svn_diff__lcs.
]]]
Cheers,
--
Johan
[1] http://svn.haxx.se/dev/archive-2010-10/0141.shtml
Received on 2010-10-09 00:56:41 CEST