On Wed, Jun 1, 2011 at 5:56 PM, Morten Kloster <morklo_at_gmail.com> wrote:
> On Wed, Jun 1, 2011 at 1:35 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>> On Tue, May 31, 2011 at 12:44 PM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
>> ...
> []
>> I'll get into some more testing and reviewing tomorrow or the day
>> after (unless someone else beats me to it :-)).
>>
>> Cheers,
>> --
>> Johan
>>
>
> I had trouble getting any reliable time differences between the
> unpatched and patched versions. By putting the token counting
> in a x300 loop, I got an increase in running time of ~10% for my
> test file (a 40MB file - the output from all my merge.c diffs -
> against a copy where I have changed the first and last line, as
> you suggested)... that was with ignore eol, but not whitespace.
> So here it looks like the patch barely increases running time at
> all; almost all the time is spent reading the files and assigning
> tokens.
Hi Morten,
Sorry, it took me a little while longer than expected, but I finally
got around to it.
I did some more tests, and upon further investigation, I too can't
really find a significant performance decrease because of the token
counting. So that looks good.
I've reviewed it all, and it looks fine. I just have one small
question about the changes in lcs.c:
[[[
@@ -218,11 +228,17 @@ prepend_lcs(svn_diff__lcs_t *lcs, apr_off_t lines,
svn_diff__lcs_t *
svn_diff__lcs(svn_diff__position_t *position_list1, /* pointer to
tail (ring) */
svn_diff__position_t *position_list2, /* pointer to
tail (ring) */
+ svn_diff__token_index_t *token_counts_list1, /* array
of counts */
+ svn_diff__token_index_t *token_counts_list2, /* array
of counts */
+ svn_diff__token_index_t num_tokens,
apr_off_t prefix_lines,
apr_off_t suffix_lines,
apr_pool_t *pool)
{
apr_off_t length[2];
+ svn_diff__token_index_t *token_counts[2];
+ svn_diff__token_index_t unique_count[2];
+ svn_diff__token_index_t token_index;
...
@@ -281,16 +310,17 @@ svn_diff__lcs(svn_diff__position_t *position_list1
sentinel_position[0].next = position_list1->next;
position_list1->next = &sentinel_position[0];
sentinel_position[0].offset = position_list1->offset + 1;
+ token_counts[0] = token_counts_list1;
sentinel_position[1].next = position_list2->next;
position_list2->next = &sentinel_position[1];
sentinel_position[1].offset = position_list2->offset + 1;
+ token_counts[1] = token_counts_list2;
...
@@ -310,12 +340,12 @@ svn_diff__lcs(svn_diff__position_t *position_list1
/* For k < 0, insertions are free */
for (k = (d < 0 ? d : 0) - p; k < 0; k++)
{
- svn_diff__snake(fp + k, &lcs_freelist, pool);
+ svn_diff__snake(fp + k, token_counts, &lcs_freelist, pool);
}
/* for k > 0, deletions are free */
for (k = (d > 0 ? d : 0) + p; k >= 0; k--)
{
- svn_diff__snake(fp + k, &lcs_freelist, pool);
+ svn_diff__snake(fp + k, token_counts, &lcs_freelist, pool);
}
p++;
]]]
Why do you copy the token_counts_list parameters into the array
token_counts[]? Is that to simplify the passing of data to __snake
(keeping number of arguments to a minimum)?
Can't you just pass the token_counts_list parameters 'as is' to
__snake as two extra arguments, and lose the token_counts[] variable?
But maybe you have a reason for this, because of compiler optimization
or inlining or something like that?
Apart from that I have no more remarks, so I think I'll be able to
commit this soon.
Thanks for your work.
--
Johan
Received on 2011-06-06 03:18:26 CEST