On Wed, Dec 15, 2010 at 2:30 AM, Stefan Fuhrmann <eqfox_at_web.de> wrote:
> On 14.12.2010 23:35, Johan Corveleyn wrote:
>> Hi all,
> Hi Johan ;)
Hi Stefan, thanks for the input :). I suspected that you might have
some ideas about this ...
>> On the diff-optimizations-bytes branch, in diff_file.c, there are two
>> functions which are called for every byte of the identical prefix and
>> suffix: increment_pointers and decrement_pointers. These functions are
>> actually equivalents of curp++ or curp--, reading the next/previous
>> byte, but taking into account the chunked-ness of the data (file data
>> is read in memory in chunks).
>> As an experiment I changed these functions into macro's, eliminating
>> the function calls. This makes the diff algorithm another 10% - 15%
>> faster (granted, this was measured with my "extreme" testcase of a 1,5
>> Mb file (60000 lines), of which most lines are identical
>> prefix/suffix). However, having an entire function like that
>> implemented as a macro isn't very pretty (see below for an example).
> I'm kind of surprised that the calling overhead is
> actually noticeable for larger files, i.e. larger values
> of file_len it should loop many times.
file_len is the size of the array of files, not the length of the
files themselves. In the typical case (the only case that's currently
supported) file_len is 2. That's because we're diffing just 2 files:
the original one and the modified one. The implementation is made
with an array and a file_len (and "for" loops), because I wanted it to
be generically useful also for diff3 (with 3 files: original, modified
and latest) and diff4 (4 files: original, modified, latest and
Also, I did my measurements with a blame operation on this very large
file, which has ~2500 revisions. So that's 2500 diffs of a ~1,5 Mb
file, with say on average 1 Mb of identical prefix/suffix every time.
That's some 2,500,000,000 function calls.
For measurement, I counted the total time of the blame operation, as
well as the cumulative time for the svn_diff_diff calls (with
apr_time_now() before and after).
>> Some considerations:
>> - Maybe I can use APR_INLINE, with similar results?
>> - Maybe I can put just the "critical section" into a macro (the curp++
>> / curp-- part), and call a function when a chunk boundary is
>> encountered (~ once every 131072 iterations (chunks are 128 Kb
>> large)), to read in the new chunk, advancing variables, ...
> Prefer inlining because the compiler is free to ignore it.
> Depending on the target architecture and the compiler,
> it may be beneficial to "narrow" the scope of optimization:
> In large functions, the optimizer may guess wrongly about
> the hot spots.
>> - Maybe it's not worth it?
> Since inlining is for free from the maintenance POV,
> any gain should justify it.
> Two things you might try:
> * introduce a local variable for afile[i]
> * replace the for() loop with two nested ones, keeping
> calls to other functions out of the hot spot:
> for (i=0; i < file_len;)
> /* hot spot: */
> for(; i < file_len; i++)
> curFile = afile[i];
> if (curFile->chunk==-1)
> curFile->chunk = 0;
> else if (curFile->curp != curFile->endp -1)
> if (i < file_len)
> /* the complex, rarely used stuff goes here */
Ok, when I have some time, I will experiment a little bit with
"inline", and see if the compiler "gets it" (I'll try your "nested for
loop" example (the corrected one, which you just sent in the other
mail) to help the compiler a little bit). I should be able to easily
compare that with the macro version.
Received on 2010-12-15 11:51:23 CET