[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [WIP PATCH] Make svn_diff_diff skip identical prefix and suffix to make diff and blame faster

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Sat, 9 Oct 2010 00:56:03 +0200

Ok, third iteration of the patch in attachment. It passes make check.

As discussed in [1], this version keeps 50 lines of the identical
suffix around, to give the algorithm a good chance to generate a diff
output of good quality (in all but the most extreme cases, this will
be the same as with the original svn_diff algorithm).

That's about the only difference with the previous iteration. So for
now, I'm submitting this for review. Any feedback is very welcome :-).

I still consider this a WIP, because of the following remaining todo's
(which may have a lot of impact on the current implementation):
- Generalize for more than 2 datasources (for diff3 and diff4).
- revv svn_diff_fns_t and maybe other stuff I've changed in public API.
- Add support for -x-b, -x-w, and -x--ignore-eol-style options. Maybe
switch the implementation to read out entire lines before comparing
(like datasources_get_next_token does).

Log message:
[[[
Make svn_diff_diff skip identical prefix and suffix to make diff and blame
faster.

* subversion/include/svn_diff.h
  (svn_diff_fns_t): Added new function types datasources_open and
   get_prefix_lines to the vtable.

* subversion/libsvn_diff/diff_memory.c
  (datasources_open): New function (does nothing).
  (get_prefix_lines): New function (does nothing).
  (svn_diff__mem_vtable): Added new functions datasources_open and
   get_prefix_lines.

* subversion/libsvn_diff/diff_file.c
  (svn_diff__file_baton_t): Added members prefix_lines, suffix_start_chunk[4]
   and suffix_offset_in_chunk.
  (increment_pointer_or_chunk, decrement_pointer_or_chunk): New functions.
  (find_identical_prefix, find_identical_suffix): New functions.
  (datasources_open): New function, to open both datasources and find their
   identical prefix and suffix. From the identical suffix, 50 lines are kept to
   help the diff algorithm find the nicest possible diff representation
   in case of ambiguity.
  (get_prefix_lines): New function.
  (datasource_get_next_token): Stop at start of identical suffix.
  (svn_diff__file_vtable): Added new functions datasources_open and
   get_prefix_lines.

* subversion/libsvn_diff/diff.h
  (svn_diff__get_tokens): Added argument "datasource_opened", to indicate that
   the datasource was already opened.

* subversion/libsvn_diff/token.c
  (svn_diff__get_tokens): Added argument "datasource_opened". Only open the
   datasource if datasource_opened is FALSE. Set the starting offset of the
   position list to the number of prefix lines.

* subversion/libsvn_diff/lcs.c
  (svn_diff__lcs): Added argument "prefix_lines". Use this to correctly set
   the offset of the sentinel position for EOF, even if one of the files
   became empty after eliminating the identical prefix.

* subversion/libsvn_diff/diff.c
  (svn_diff__diff): Add a chunk of "common" diff for identical prefix.
  (svn_diff_diff): Use new function datasources_open, to open original and
   modified at once, and find their identical prefix and suffix. Pass
   prefix_lines to svn_diff__lcs and to svn_diff__diff.

* subversion/libsvn_diff/diff3.c
  (svn_diff_diff3): Pass datasource_opened = FALSE to svn_diff__get_tokens.
   Pass prefix_lines = 0 to svn_diff__lcs.

* subversion/libsvn_diff/diff4.c
  (svn_diff_diff4): Pass datasource_opened = FALSE to svn_diff__get_tokens.
   Pass prefix_lines = 0 to svn_diff__lcs.
]]]

Cheers,

-- 
Johan
[1] http://svn.haxx.se/dev/archive-2010-10/0141.shtml

Received on 2010-10-09 00:56:41 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.