Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Mon, 20 Dec 2010 02:43:18 +0100

On Wed, Dec 15, 2010 at 10:58 AM, Stefan Fuhrmann <eqfox_at_web.de> wrote:
> On 15.12.2010 02:30, Stefan Fuhrmann wrote:
>>
>> On 14.12.2010 23:35, Johan Corveleyn wrote:
>>
>>> Thoughts?
>>
>> Two things you might try:
>> * introduce a local variable for afile[i]
>> * replace the for() loop with two nested ones, keeping
>> calls to other functions out of the hot spot:
>>
>> for (i=0; i < file_len;)
>
> That should read:
> for (i=0; i < file_len; i++)
>>
>> {
>> /* hot spot: */
>> for(; i < file_len; i++)
>> {
>> curFile = afile[i];
>> if (curFile->chunk==-1)
>> curFile->chunk = 0;
>> else if (curFile->curp != curFile->endp -1)
>> curFile->curp++;
>> else
>> break;
>> }
>>
>> if (i < file_len)
>> {
>> /* the complex, rarely used stuff goes here */
>> }
>> }

Ok, I tried this, but it didn't really help. It's still only about as
fast as before. While the macro version is about 10% faster (I made a
cleaner macro version, only macro'ing the hotspot stuff, with a
function call to the complex, rarely used stuff).

For the inline version, I tried several variations (always with
APR_INLINE in the function signature, and with local variable for
file[i]): with or without the nested for loop, with or without the
complex stuff factored out to a separate function, ... it made no
difference.

Maybe I'm still doing something wrong, keeping the compiler from
inlining it (but I'm not really a compiler expert, maybe I have to add
some compiler options or something, ...). OTOH, I now have a macro
implementation which is quite readable (and which uses the do ...
while(0) construct - thanks Peter), and which does the trick. So maybe
I should simply stick with that option ...

FYI, in attachment some patches for the different variations (to be
applied against HEAD of diff-optimizations-bytes branch). And the two
main versions here below in full for easy review/discussion/context.
At the bottom, some of my measurements. As always, I welcome any
feedback.

New macro version (increment only, decrement is similar):
[[[
/* For all files in the FILE array, increment the curp pointer. If a file
* points before the beginning of file, let it point at the first byte again.
* If the end of the current chunk is reached, read the next chunk in the
* buffer and point curp to the start of the chunk. If EOF is reached, set
* curp equal to endp to indicate EOF. */
#define increment_pointers(all_files, files_len, pool) \
  do { \
    int i; \
                                                                             \
    for (i = 0; i < files_len; i++) \
    { \
      if (all_files[i]->chunk == -1) /* indicates before beginning of file */\
        all_files[i]->chunk = 0; /* point to beginning of file again */ \
      else if (all_files[i]->curp != all_files[i]->endp - 1) \
        all_files[i]->curp++; \
      else \
        SVN_ERR(increment_chunk(all_files[i], pool)); \
    } \
  } while (0)

static svn_error_t *
increment_chunk(struct file_info *file, apr_pool_t *pool)
{
apr_off_t length;
apr_off_t last_chunk = offset_to_chunk(file->size);

  /* Unless this was the last chunk, we need to read a new chunk. */
  if (file->chunk == last_chunk)
    {
      file->curp++; /* curp == endp signals end of file */
    }
  else
    {
      file->chunk++;
      length = file->chunk == last_chunk ?
        offset_in_chunk(file->size) : CHUNK_SIZE;
      SVN_ERR(read_chunk(file->file, file->path, file->buffer,
                         length, chunk_to_offset(file->chunk),
                         pool));
      file->endp = file->buffer + length;
      file->curp = file->buffer;
    }

return SVN_NO_ERROR;
}
]]]

APR_INLINE version (with nested for loop, and separate function)
[[[
/* For all files in the FILE array, increment the curp pointer. If a file
* points before the beginning of file, let it point at the first byte again.
* If the end of the current chunk is reached, read the next chunk in the
* buffer and point curp to the start of the chunk. If EOF is reached, set
* curp equal to endp to indicate EOF. */
static APR_INLINE svn_error_t *
increment_pointers(struct file_info *file[], int file_len, apr_pool_t *pool)
{
struct file_info *curFile;
int i;

  for (i = 0; i < file_len; i++)
  {
    /* hot spot: */
    for(; i < file_len; i++)
    {
      curFile = file[i];
      if (curFile->chunk == -1) /* indicates before beginning of file */
        curFile->chunk = 0; /* point to beginning of file again */
      else if (curFile->curp != curFile->endp - 1)
        curFile->curp++;
      else
        break;
    }

    if (i < file_len)
    {
      SVN_ERR(increment_chunk(curFile, pool));
    }
  }

return SVN_NO_ERROR;
}

static svn_error_t *
increment_chunk(struct file_info *file, apr_pool_t *pool)
{
apr_off_t length;
apr_off_t last_chunk = offset_to_chunk(file->size);

return SVN_NO_ERROR;
}
]]]

Finally, some measurements (with the help of some instrumentation code
in blame.c, see patch in attachment). This is timing of "svn blame
-x-b" of a ~1,5 Mb file (61000 lines) with 2272 revisions (each time
average of 2nd and 3rd run):

Total blame time (s) diff time (s)
Normal function (original version): 118 80
Inline function (w/nested for) : 118 79
Macro version : 115 72

Ok, it's nothing huge, but it makes me feel a little bit better :-).

Cheers,

-- 
Johan

text/plain attachment: patch_macro.txt

text/plain attachment: patch_inline_separatefn.txt

text/plain attachment: patch_inline_nestedfor_separatefn.txt

text/plain attachment: patch_blameinstru.txt

Received on 2010-12-20 02:44:24 CET

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]