[RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)

From: Johan Corveleyn <jcorvel_at_gmail.com>
Date: Tue, 14 Dec 2010 23:35:47 +0100

Hi all,

On the diff-optimizations-bytes branch, in diff_file.c, there are two
functions which are called for every byte of the identical prefix and
suffix: increment_pointers and decrement_pointers. These functions are
actually equivalents of curp++ or curp--, reading the next/previous
byte, but taking into account the chunked-ness of the data (file data
is read in memory in chunks).

As an experiment I changed these functions into macro's, eliminating
the function calls. This makes the diff algorithm another 10% - 15%
faster (granted, this was measured with my "extreme" testcase of a 1,5
Mb file (60000 lines), of which most lines are identical
prefix/suffix). However, having an entire function like that
implemented as a macro isn't very pretty (see below for an example).

Some considerations:
- Maybe I can use APR_INLINE, with similar results?

- Maybe I can put just the "critical section" into a macro (the curp++
/ curp-- part), and call a function when a chunk boundary is
encountered (~ once every 131072 iterations (chunks are 128 Kb
large)), to read in the new chunk, advancing variables, ...

- Maybe it's not worth it?

Thoughts?

Just for kicks, here is an example of increment_pointers as a macro:
[[[
#define increment_pointers(afile, file_len, pool) { \
  int i; \
                                                                             \
  for (i = 0; i < file_len; i++) \
    if (afile[i]->chunk == -1) /* indicates before beginning of file */ \
      { \
        afile[i]->chunk = 0; /* point to beginning of file again */ \
      } \
    else if (afile[i]->curp == afile[i]->endp - 1) \
      { \
        apr_off_t last_chunk = offset_to_chunk(afile[i]->size); \
        if (afile[i]->chunk == last_chunk) \
          { \
            afile[i]->curp++; /* curp == endp signals end of file */ \
          } \
        else \
          { \
            apr_off_t length; \
            afile[i]->chunk++; \
            length = afile[i]->chunk == last_chunk ? \
              offset_in_chunk(afile[i]->size) : CHUNK_SIZE; \
            SVN_ERR(read_chunk(afile[i]->file, afile[i]->path,
afile[i]->buffer,\
                               length, chunk_to_offset(afile[i]->chunk), \
                               pool)); \
            afile[i]->endp = afile[i]->buffer + length; \
            afile[i]->curp = afile[i]->buffer; \
          } \
      } \
    else \
      { \
        afile[i]->curp++; \
      } \
  }
]]]

Cheers,

-- 
Johan

Received on 2010-12-14 23:36:44 CET

This message: [ Message body ]
Next message: Peter Samuelson: "Re: Testing help/advice needed."
Previous message: Stefan Sperling: "Re: Testing help/advice needed."
Next in thread: Peter Samuelson: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"
Reply: Peter Samuelson: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"
Reply: Stefan Fuhrmann: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"
Reply: Philip Martin: "Re: [RFC] diff-optimizations-bytes branch: avoiding function call overhead (?)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]