[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: Performance optimization - svn_stringbuf_appendbyte()

From: Julian Foad <julian.foad_at_wandisco.com>
Date: Mon, 11 Oct 2010 16:07:02 +0100

On Sun, 2010-10-10, Stefan Fuhrmann wrote:
> On 07.10.2010 16:07, Julian Foad wrote:
> >> New Revision: 997203
> >>
> >> URL: http://svn.apache.org/viewvc?rev=997203&view=rev
> >> Log:
> >> Merge r985037, r985046, r995507 and r995603 from the performance branch.
> >>
> >> These changes introduce the svn_stringbuf_appendbyte() function, which has
> >> significantly less overhead than svn_stringbuf_appendbytes(), and can be
> >> used in a number of places within our codebase.
> > Hi Stefan2.
> >
> > I have been wondering about the actual benefit of such tightly
> > hand-optimized code. Today I got around to giving it a quick spin.
> >
> > In my first test, it made no difference whatsoever, because the
> > optimized code is never executed. The recent merge r1002898 of r1001413
> > from the performance branch introduced a bug:
> >
> > - if (str->blocksize> old_len + 1)
> > + if (str->blocksize< old_len + 1)
> WTF how did that happen?
> Well, that's what reviews are for ;)
> > which totally disables the optimized code path.
> >
> > Fixed in r1005437.
> Thanks.
> > That solved, a loop doing a million simple appendbyte calls runs more
> > than twice as fast as calling appendbytes(..., 1). That's fantastic.
> >
> > But what about that hand-optimization? I wrote a simple version of the
> > function, and called it 'appendbyte0':
> >
> > svn_stringbuf_appendbyte0(svn_stringbuf_t *str, char byte)
> > {
> > if (str->blocksize> str->len + 1)
> > {
> > str->data[str->len++] = byte;
> > str->data[str->len] = '\0';
> > }
> > else
> > svn_stringbuf_appendbytes(str,&byte, 1);
> > }
> >
> > Here are the results (see full patch attached):
> >
> > Times: appendbytes appendbyte0 appendbyte (in ms)
> > run: 89 31 34
> > run: 88 30 35
> > run: 88 31 34
> > run: 88 30 34
> > run: 88 31 34
> > min: 88 30 34
> >
> > This tells me that the hand-optimization is actually harmful and the
> > compiler does a 10% better job by itself.
> >
> > Have I made a mistake?
> My guess is that you might have made two mistakes actually.

Heh, OK.

> First, the number of operations was relatively low - everything
> in the low ms range could be due to secondary effects (and
> still be repeatable).

I can't think of any reason why. I ran the whole benchmark lots of
times. Occasionally some of the times were a big chunk longer due to
other system activity. Normally they were stable. I also ran it
several times with 10 million ops instead of 1 million, and the results
were exactly 10x longer with the same degree of variability.

> The most important problem would be compiler optimizations
> or lack thereof. Since the numbers are very large (50..100
> ticks per byte, depending on your CPU clock), I would assume
> that you used a normal debug build for testing. In that case,

You're right. I'll try again, with "--disable-debug -O2".

> the actual number of C statements has a large impact on
> the execution time. See my results below for details.
> > What are the results for your system?
> >
> > (I'm using GCC 4.4.1 on an Intel Centrino laptop CPU.)
> >
> Test code used (10^10 calls, newer re-allocate):
>
> int i;
> unsigned char c;
>
> svn_stringbuf_t* s = svn_stringbuf_create_ensure (255, pool);

OK so you're avoiding any calls to the "re-alloc". My tests included
re-allocs, but were long enough strings that this wasn't much overhead.
Nevertheless I'll avoid re-allocs so we can compare results fairly.

> for (i = 0; i < 50000000; ++i)
> {
> s->len = 0;
> for (c = 0; c < 200; ++c)
> svn_stringbuf_appendbyte (s, c);
> }
>
>
> XEON 55xx / Core i7, hyper-threading on, 3GHz peak
> 64 bits Linux, GCC 4.3.3; ./configure --disable-debug
>
> (1) 10,7s; IPC = 2.1
> (2) 8,11s; IPC = 1.4
> (3) 2,64s; IPC = 2.4
> (4) 2,43s; IPC = 2.3

> (1) use appendbytes gives 9 instructions in main, 59 in appbytes
> (2) handle count==1 directly in in appendbytes; 9 inst. in main, 26 in
> appbytes
> (3) appendbyte0 (same compiler output as the the non-handtuned code);
> 13 inst. in appbyte, 6 in main
> (4) trunk_at_HEAD appendbyte; 11 inst. in appbyte, 6 in main
>
> Core2 2.4GHz, Win32, VS2010
> (not using valgrind to count instructions here; also not testing (2))
>
> (1) 17,0s release, 20,2s debug
> (3) 10,6s release, 12,2s debug
> (4) 9,7s release, 13,0s debug

With a test harness more similar to yours, and built with
"--disable-debug -O2", here are my relative numbers (but only 1 million
outer loops, compared to your 50 million):

$ svn-c-test subr string 24
Times for 1000000 x 200 bytes, in seconds
  (1)appendbytes (3)appendbyte0 (4)trunk_at_HEAD (5)macro
run: 7.03 2.06 1.37 0.53
run: 6.62 1.76 1.55 0.53
run: 6.53 1.67 1.44 0.53
run: 6.54 1.60 1.44 0.53
run: 6.52 1.84 1.37 0.53
min: 6.52 1.60 1.37 0.53

This agrees with what you found: the hand-tuned code gives a 10%
improvement.

This also shows another variant, writing the function as a macro,
reported in the column "(5)macro". This gives a further two-and-a-half
times speed-up in this test.

[...]
> * your numbers seem very large in comparison
> -> probably something else going on

Can you use my test code or show me yours? Mine has the harness in
"string-test.c" and the code under test in libsvn_subr - see attachment
- and the default build on my platform loads libsvn_subr as a shared
library. Maybe yours is completely different.

Other observations:

- Some callers that use very tight loops are in fact simply copying a
string, up as far as some particular end marker. They may benefit from
being re-written to first determine how much they want to copy and then
call svn_stringbuf_appendbytyes().

- The stringbuf type initially allocates a 1-byte space, and doubles
this every time it re-allocates. Could we not could allocate at least a
few bytes (4? 8?) as a minimum, without actually consuming more memory
in practice? That would avoid two or three re-allocs in some usage
patterns.
 
- Julian

Received on 2010-10-11 17:07:49 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.