Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Wed, 15 Dec 2010 11:53:01 +0100

On Wed, Dec 15, 2010 at 11:50 AM, Johan Corveleyn <jcorvel_at_gmail.com> wrote:
> On Wed, Dec 15, 2010 at 2:30 AM, Stefan Fuhrmann <eqfox_at_web.de> wrote:
>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>>
>>> Hi all,
>>
>> Hi Johan ;)
>
> Hi Stefan, thanks for the input :). I suspected that you might have
> some ideas about this ...
>
>>> On the diff-optimizations-bytes branch, in diff_file.c, there are two
>>> functions which are called for every byte of the identical prefix and
>>> suffix: increment_pointers and decrement_pointers. These functions are
>>> actually equivalents of curp++ or curp--, reading the next/previous
>>> byte, but taking into account the chunked-ness of the data (file data
>>> is read in memory in chunks).
>>>
>>> As an experiment I changed these functions into macro's, eliminating
>>> the function calls. This makes the diff algorithm another 10% - 15%
>>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>>> Mb file (60000 lines), of which most lines are identical
>>> prefix/suffix). However, having an entire function like that
>>> implemented as a macro isn't very pretty (see below for an example).
>>
>> I'm kind of surprised that the calling overhead is
>> actually noticeable for larger files, i.e. larger values
>> of file_len it should loop many times.
>
> file_len is the size of the array of files, not the length of the
> files themselves. In the typical case (the only case that's currently
> supported) file_len is 2. That's because we're diffing just 2 files:
> the original one and the modified one. The implementation is made
> with an array and a file_len (and "for" loops), because I wanted it to
> be generically useful also for diff3 (with 3 files: original, modified
> and latest) and diff4 (4 files: original, modified, latest and
> ancestor).
>
> Also, I did my measurements with a blame operation on this very large
> file, which has ~2500 revisions. So that's 2500 diffs of a ~1,5 Mb
> file, with say on average 1 Mb of identical prefix/suffix every time.
> That's some 2,500,000,000 function calls.

That should be: 5,000,000,000 function calls, because for every diff
we advance prefix and suffix pointers in two files: the original one
and the modified one.

Johan

> For measurement, I counted the total time of the blame operation, as
> well as the cumulative time for the svn_diff_diff calls (with
> apr_time_now() before and after).
>
>>> Some considerations:
>>> - Maybe I can use APR_INLINE, with similar results?
>>>
>>> - Maybe I can put just the "critical section" into a macro (the curp++
>>> / curp-- part), and call a function when a chunk boundary is
>>> encountered (~ once every 131072 iterations (chunks are 128 Kb
>>> large)), to read in the new chunk, advancing variables, ...
>>
>> Prefer inlining because the compiler is free to ignore it.
>> Depending on the target architecture and the compiler,
>> it may be beneficial to "narrow" the scope of optimization:
>> In large functions, the optimizer may guess wrongly about
>> the hot spots.
>>
>>> - Maybe it's not worth it?
>>
>> Since inlining is for free from the maintenance POV,
>> any gain should justify it.
>>>
>>> Thoughts?
>>
>> Two things you might try:
>> * introduce a local variable for afile[i]
>> * replace the for() loop with two nested ones, keeping
>> calls to other functions out of the hot spot:
>>
>> for (i=0; i < file_len;)
>> {
>> /* hot spot: */
>> for(; i < file_len; i++)
>> {
>> curFile = afile[i];
>> if (curFile->chunk==-1)
>> curFile->chunk = 0;
>> else if (curFile->curp != curFile->endp -1)
>> curFile->curp++;
>> else
>> break;
>> }
>>
>> if (i < file_len)
>> {
>> /* the complex, rarely used stuff goes here */
>> }
>> }
>
> Ok, when I have some time, I will experiment a little bit with
> "inline", and see if the compiler "gets it" (I'll try your "nested for
> loop" example (the corrected one, which you just sent in the other
> mail) to help the compiler a little bit). I should be able to easily
> compare that with the macro version.
>
> --
> Johan
>
Received on 2010-12-15 11:53:59 CET

This message: [ Message body ]
Next message: Johan Corveleyn: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"
Previous message: Johan Corveleyn: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"
In reply to: Johan Corveleyn: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"
Next in thread: Johan Corveleyn: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"
Reply: Johan Corveleyn: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]