Hi all,
On the diff-optimizations-bytes branch, in diff_file.c, there are two
functions which are called for every byte of the identical prefix and
suffix: increment_pointers and decrement_pointers. These functions are
actually equivalents of curp++ or curp--, reading the next/previous
byte, but taking into account the chunked-ness of the data (file data
is read in memory in chunks).
As an experiment I changed these functions into macro's, eliminating
the function calls. This makes the diff algorithm another 10% - 15%
faster (granted, this was measured with my "extreme" testcase of a 1,5
Mb file (60000 lines), of which most lines are identical
prefix/suffix). However, having an entire function like that
implemented as a macro isn't very pretty (see below for an example).
Some considerations:
- Maybe I can use APR_INLINE, with similar results?
- Maybe I can put just the "critical section" into a macro (the curp++
/ curp-- part), and call a function when a chunk boundary is
encountered (~ once every 131072 iterations (chunks are 128 Kb
large)), to read in the new chunk, advancing variables, ...
- Maybe it's not worth it?
Thoughts?
Just for kicks, here is an example of increment_pointers as a macro:
[[[
#define increment_pointers(afile, file_len, pool) { \
int i; \
\
for (i = 0; i < file_len; i++) \
if (afile[i]->chunk == -1) /* indicates before beginning of file */ \
{ \
afile[i]->chunk = 0; /* point to beginning of file again */ \
} \
else if (afile[i]->curp == afile[i]->endp - 1) \
{ \
apr_off_t last_chunk = offset_to_chunk(afile[i]->size); \
if (afile[i]->chunk == last_chunk) \
{ \
afile[i]->curp++; /* curp == endp signals end of file */ \
} \
else \
{ \
apr_off_t length; \
afile[i]->chunk++; \
length = afile[i]->chunk == last_chunk ? \
offset_in_chunk(afile[i]->size) : CHUNK_SIZE; \
SVN_ERR(read_chunk(afile[i]->file, afile[i]->path,
afile[i]->buffer,\
length, chunk_to_offset(afile[i]->chunk), \
pool)); \
afile[i]->endp = afile[i]->buffer + length; \
afile[i]->curp = afile[i]->buffer; \
} \
} \
else \
{ \
afile[i]->curp++; \
} \
}
]]]
Cheers,
--
Johan
Received on 2010-12-14 23:36:44 CET