[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Review results of diff-optimizations-bytes branch

From: Stefan Fuhrmann <stefanfuhrmann_at_alice-dsl.de>
Date: Sun, 06 Feb 2011 21:19:54 +0100

Hi Johan,

I just finished the review of the changes on your branch.
Except for a few minor hints, everything looks fine and
should be merged to /trunk. The two findings can be
addressed on /trunk directly.

-- Stefan^2.

> Index: BRANCH-README
> ===================================================================
> --- BRANCH-README (.../trunk) (revision 0)
> +++ BRANCH-README (.../branches/diff-optimizations-bytes)
> (revision 1064940)
<snip>
> +Rejected ideas:
> + * Make prefix scanning able to skip 1 or a couple of non-matching
> lines, if
> + it is able to find strictly more matching lines after that, to
> keep the
> + prefix scanning going.
> +
> + This is not a good idea, because it can lead to bad cases of
> + missynchronization (see [3]):
> +
> + bxxaxxbxx
> + || ||
> + axxbxx
> +
> + instead of (longest common subsequence):
> +
> + bxxaxxbxx
> + //////
> + axxbxx
> +
Interesting. I think the conditions to skip N lines would be

* is followed by at least 2*N exact matching lines
   (making one side of that section a "non-match" costs
   more than what we might gain for the previous lines)
* the matching section is not a prefix to itself, e.g. does
   not match to a later part of itself (or that is at least 2N
   lines away)

But I haven't done a formal proof on that.

> Index: subversion/libsvn_diff/diff_memory.c
> ===================================================================
> --- subversion/libsvn_diff/diff_memory.c (.../trunk) (revision
> 1064940)
> +++ subversion/libsvn_diff/diff_memory.c
> (.../branches/diff-optimizations-bytes) (revision 1064940)
> @@ -88,16 +88,19 @@
> }
>
>
> -/* Implements svn_diff_fns_t::datasource_open */
> +/* Implements svn_diff_fns2_t::datasources_open */
> static svn_error_t *
> -datasource_open(void *baton, svn_diff_datasource_e datasource)
> +datasources_open(void *baton, apr_off_t *prefix_lines,
> + svn_diff_datasource_e datasource[],
> + apr_size_t datasource_len)
> {
> /* Do nothing: everything is already there and initialized to 0 */
> + *prefix_lines = 0;
> return SVN_NO_ERROR;
> }
So, your optimizations don't apply to string comparisons?

I have no idea when this code will be run at all, but it should
not be too hard to apply the same prefix scanning code
to strings as well (they would appear as a single chunk of
"file data").

> Index: subversion/libsvn_diff/diff_file.c
> ===================================================================
> --- subversion/libsvn_diff/diff_file.c (.../trunk) (revision
> 1064940)
> +++ subversion/libsvn_diff/diff_file.c
> (.../branches/diff-optimizations-bytes) (revision 1064940)
<snip>
> +static svn_error_t *
> +datasources_open(void *baton, apr_off_t *prefix_lines,
> + svn_diff_datasource_e datasource[],
> + apr_size_t datasource_len)
> +{
> svn_diff__file_baton_t *file_baton = baton;
> - struct file_info *file =
> &file_baton->files[datasource_to_index(datasource)];
> - apr_finfo_t finfo;
> - apr_off_t length;
> - char *curp;
> - char *endp;
> + struct file_info files[4];
> + apr_finfo_t finfo[4];
> + apr_off_t length[4];
Maybe, the "magic" 4 should be replaced by some
constant / define. Also, we can afford for a check
at the start of this function comparing datasource_len
to 4 and returning an error upon overflow.
Received on 2011-02-06 21:20:50 CET

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.