Hi,
Here is a second iteration of the patch. It now passes make check.
Differences from the previous version are:
- Support for \r eol-style (\n and \r\n was already ok).
- The number of prefix_lines is now passed to svn_diff__lcs, so it can
use that value to set the position offset of the "EOF" marker
correctly, in case one of both files has become empty after skipping
the prefix. This fixes the crashes in blame_tests.py 2 and 7.
The patch is pretty big, so please let me know if I should split it up
to make it more reviewable (I could easily split it up between the
prefix-finding and the suffix-finding, at the cost of having overview
over the entire algorithm).
Still to do:
- Think about why results are sometimes different (because of
eliminated suffix, the LCS can sometimes be slightly different), and
what can be done about it.
- Generalize for more than 2 datasources (for diff3 and diff4).
- revv svn_diff_fns_t and maybe other stuff I've changed in public API.
- Add support for -x-b, -x-w, and -x--ignore-eol-style options.
But I'd like to do those things in follow-up patches, after this one
has been reviewed and digested a little bit. So at this point: review,
feedback, ... very welcome :-).
Log message:
[[[
Make svn_diff_diff skip identical prefix and suffix to make diff and blame
faster.
* subversion/include/svn_diff.h
(svn_diff_fns_t): Added new function types datasources_open and
get_prefix_lines to the vtable.
* subversion/libsvn_diff/diff_memory.c
(datasources_open): New function (does nothing).
(get_prefix_lines): New function (does nothing).
(svn_diff__mem_vtable): Added new functions datasources_open and
get_prefix_lines.
* subversion/libsvn_diff/diff_file.c
(svn_diff__file_baton_t): Added members prefix_lines, suffix_start_chunk[4]
and suffix_offset_in_chunk.
(increment_pointer_or_chunk, decrement_pointer_or_chunk): New functions.
(find_identical_prefix, find_identical_suffix): New functions.
(datasources_open): New function, to open both datasources and find their
identical prefix and suffix.
(get_prefix_lines): New function.
(datasource_get_next_token): Stop at start of identical suffix.
(svn_diff__file_vtable): Added new functions datasources_open and
get_prefix_lines.
* subversion/libsvn_diff/diff.h
(svn_diff__get_tokens): Added argument "datasource_opened", to indicate that
the datasource was already opened.
* subversion/libsvn_diff/token.c
(svn_diff__get_tokens): Added argument "datasource_opened". Only open the
datasource if datasource_opened is FALSE. Set the starting offset of the
position list to the number of prefix lines.
* subversion/libsvn_diff/lcs.c
(svn_diff__lcs): Added argument "prefix_lines". Use this to correctly set
the offset of the sentinel position for EOF, even if one of the files
became empty after eliminating the identical prefix.
* subversion/libsvn_diff/diff.c
(svn_diff__diff): Add a chunk of "common" diff for identical prefix.
(svn_diff_diff): Use new function datasources_open, to open original and
modified at once, and find their identical prefix and suffix. Pass
prefix_lines to svn_diff__lcs and to svn_diff__diff.
* subversion/libsvn_diff/diff3.c
(svn_diff_diff3): Pass datasource_opened = FALSE to svn_diff__get_tokens.
Pass prefix_lines = 0 to svn_diff__lcs.
* subversion/libsvn_diff/diff4.c
(svn_diff_diff4): Pass datasource_opened = FALSE to svn_diff__get_tokens.
Pass prefix_lines = 0 to svn_diff__lcs.
]]]
Cheers,
--
Johan
Received on 2010-10-03 01:46:47 CEST