[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: [PATCH] Speed-up of libsvn_diff using token counts

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Mon, 6 Jun 2011 03:17:36 +0200

On Wed, Jun 1, 2011 at 5:56 PM, Morten Kloster <morklo_at_gmail.com> wrote:
> On Wed, Jun 1, 2011 at 1:35 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>> On Tue, May 31, 2011 at 12:44 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>> ...
> []
>> I'll get into some more testing and reviewing tomorrow or the day
>> after (unless someone else beats me to it :-)).
>>
>> Cheers,
>> --
>> Johan
>>
>
> I had trouble getting any reliable time differences between the
> unpatched and patched versions. By putting the token counting
> in a x300 loop, I got an increase in running time of ~10% for my
> test file (a 40MB file - the output from all my merge.c diffs -
> against a copy where I have changed the first and last line, as
> you suggested)... that was with ignore eol, but not whitespace.
> So here it looks like the patch barely increases running time at
> all; almost all the time is spent reading the files and assigning
> tokens.

Hi Morten,

Sorry, it took me a little while longer than expected, but I finally
got around to it.

I did some more tests, and upon further investigation, I too can't
really find a significant performance decrease because of the token
counting. So that looks good.

I've reviewed it all, and it looks fine. I just have one small
question about the changes in lcs.c:

[[[
@@ -218,11 +228,17 @@ prepend_lcs(svn_diff__lcs_t *lcs, apr_off_t lines,
 svn_diff__lcs_t *
 svn_diff__lcs(svn_diff__position_t *position_list1, /* pointer to
tail (ring) */
               svn_diff__position_t *position_list2, /* pointer to
tail (ring) */
+ svn_diff__token_index_t *token_counts_list1, /* array
of counts */
+ svn_diff__token_index_t *token_counts_list2, /* array
of counts */
+ svn_diff__token_index_t num_tokens,
               apr_off_t prefix_lines,
               apr_off_t suffix_lines,
               apr_pool_t *pool)
 {
   apr_off_t length[2];
+ svn_diff__token_index_t *token_counts[2];
+ svn_diff__token_index_t unique_count[2];
+ svn_diff__token_index_t token_index;

...

@@ -281,16 +310,17 @@ svn_diff__lcs(svn_diff__position_t *position_list1
   sentinel_position[0].next = position_list1->next;
   position_list1->next = &sentinel_position[0];
   sentinel_position[0].offset = position_list1->offset + 1;
+ token_counts[0] = token_counts_list1;

   sentinel_position[1].next = position_list2->next;
   position_list2->next = &sentinel_position[1];
   sentinel_position[1].offset = position_list2->offset + 1;
+ token_counts[1] = token_counts_list2;

...

@@ -310,12 +340,12 @@ svn_diff__lcs(svn_diff__position_t *position_list1
       /* For k < 0, insertions are free */
       for (k = (d < 0 ? d : 0) - p; k < 0; k++)
         {
- svn_diff__snake(fp + k, &lcs_freelist, pool);
+ svn_diff__snake(fp + k, token_counts, &lcs_freelist, pool);
         }
          /* for k > 0, deletions are free */
       for (k = (d > 0 ? d : 0) + p; k >= 0; k--)
         {
- svn_diff__snake(fp + k, &lcs_freelist, pool);
+ svn_diff__snake(fp + k, token_counts, &lcs_freelist, pool);
         }

       p++;
]]]

Why do you copy the token_counts_list parameters into the array
token_counts[]? Is that to simplify the passing of data to __snake
(keeping number of arguments to a minimum)?

Can't you just pass the token_counts_list parameters 'as is' to
__snake as two extra arguments, and lose the token_counts[] variable?

But maybe you have a reason for this, because of compiler optimization
or inlining or something like that?

Apart from that I have no more remarks, so I think I'll be able to
commit this soon.

Thanks for your work.

-- 
Johan
Received on 2011-06-06 03:18:26 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.