On Wed, Jan 19, 2011 at 1:19 AM, Stefan Fuhrmann
<stefanfuhrmann_at_alice-dsl.de> wrote:
> On 18.01.2011 12:56, Johan Corveleyn wrote:
>>
>> On Mon, Jan 17, 2011 at 3:12 AM, Johan Corveleyn<jcorvel_at_gmail.com>
>> wrote:
>>>
>>> On Sat, Jan 8, 2011 at 6:50 AM, Stefan Fuhrmann
>>> <stefanfuhrmann_at_alice-dsl.de> wrote:
>>
>> [ ... snip ... ]
>>
>>>> But I think, the stack variable is certainly helpful
>>>> and easy to do.
>>
>> Ok, I've done this (locally, still have to clean up a little and then
>> commit). It gives a nice 10% speedup of prefix/suffix scanning, which
>> is great.
>>
>> I have a couple of small questions though, about other small
>> optimizations you did:
>>
>>> Index: subversion/libsvn_diff/diff_file.c
>>> ===================================================================
>>> --- subversion/libsvn_diff/diff_file.c (revision 1054044)
>>> +++ subversion/libsvn_diff/diff_file.c (working copy)
>>
>> ...
>>>
>>> @@ -258,10 +259,10 @@
>>>
>>> \
>>> for (i = 0; i< files_len; i++)
>>> \
>>> {
>>> \
>>> - if (all_files[i]->curp< all_files[i]->endp - 1)
>>> \
>>> - all_files[i]->curp++;
>>> \
>>> + if (all_files[i].curp + 1< all_files[i].endp)
>>> \
>>> + all_files[i].curp++;
>>> \
>>
>> Just curious: why did you change that condition (from 'X< Y - 1' to
>> 'X + 1< Y')?
>>
>> You mention in your log message: "minor re-arragements in arithmetic
>> expressions to maximize reuse of results (e.g. in
>> INCREMENT_POINTERS)".
>>
>> Why does this help, and what's the potential impact?
>
> There is a "hidden" common sub-expression.
> Variant 1 in pseudo-code:
>
> lhs = all_files[i].curp;
> temp1 = all_files[i].endp;
> rhs = temp1 - 1;
> if (lhs < rhs)
> {
> temp2 = lhs + 1;
> all_files[i].curp = temp2;
> }
>
> Variant 2:
>
> lhs = all_files[i].curp;
> temp = lhs + 1;
> rhs = all_files[i].endp;
> if (lhs < rhs)
> all_files[i].curp = temp;
>
> The problem is that the compiler cannot figure that out
> on its own because (x+1) and (y-1) don't overflow / wrap
> around for the same values. In particular 0 < (0-1) is true
> and (0+1) < 0 is false in unsigned arithmetic.
Thanks for the explanation. So, I understand why this should be more
efficient, except ... that it's not, for some reason. At least not
after testing it on my x86 machine.
It's about 1% slower, tested with two test files generated with the
gen-big-files.py script (sent in the other thread), with 1,000,000
prefix and suffix lines, and 500 mismatching lines in the middle:
$ ./gen-big-files.py -1 big1.txt -2 big2.txt -p 1000000 -s 1000000
So for now, I didn't commit this change. Maybe it's better on other
platforms, so it may still make sense to do it, but then again, at
around 1% it's probably not that important ...
>>
>> Second question:
>>
>>> +find_identical_prefix_slow(svn_boolean_t *reached_one_eof,
>>> + apr_off_t *prefix_lines,
>>> + struct file_info file[], apr_size_t file_len,
>>> + apr_pool_t *pool)
>>> +{
>>> + svn_boolean_t had_cr = FALSE;
>>> svn_boolean_t is_match, reached_all_eof;
>>> apr_size_t i;
>>> + apr_off_t lines = 0;
>>>
>>> *prefix_lines = 0;
>>> for (i = 1, is_match = TRUE; i< file_len; i++)
>>> - is_match = is_match&& *file[0]->curp == *file[i]->curp;
>>> + is_match = is_match&& *file[0].curp == *file[i].curp;
>>> +
>>> while (is_match)
>>> {
>>> /* ### TODO: see if we can take advantage of
>>> diff options like ignore_eol_style or ignore_space. */
>>> /* check for eol, and count */
>>> - if (*file[0]->curp == '\r')
>>> + if (*file[0].curp == '\r')
>>> {
>>> - (*prefix_lines)++;
>>> + lines++;
>>
>> Here you added a local variable 'lines', which is incremented in the
>> function, and copied to *prefix_lines at the end (you do the same in
>> fine_identical_prefix_fast). The original code just sets and
>> increments *prefix_lines directly. So, out of curiousness: why does
>> this help, and how much?
>
> Incrementing the temporary variable requires one
> indirection less than the original code and uses only
> one instead of two registers (in typical cases).
>
> The 'how much?' part depends on compiler / optimizer
> and even more on the target platform (x86 has very
> few registers to keep data in fly; x64 has about twice
> as many).
Ok, this apparently sped it up for around 1% on my machine. So I
committed this in r1062275.
Thanks.
Cheers,
--
Johan
Received on 2011-01-22 22:20:51 CET