On Sat, Sep 5, 2015 at 11:19 PM, Branko Čibej <brane_at_wandisco.com> wrote:
> On 05.09.2015 18:53, Ivan Zhakov wrote:
> > On 5 September 2015 at 19:23, Branko Čibej <brane_at_wandisco.com> wrote:
> >> On 05.09.2015 02:53, Branko Čibej wrote:
> >>> On 04.09.2015 21:17, stefan2_at_apache.org wrote:
> >>>> Author: stefan2
> >>>> Date: Fri Sep 4 19:17:44 2015
> >>>> New Revision: 1701317
> >>>>
> >>>> URL: http://svn.apache.org/r1701317
> >>>> Log:
> >>>> Finally, make svn_ra_svn__list_t actually a fully typed,
> ra_svn-specific
> >>>> object. Update the creation functions; everything else already "just
> fits".
> >>> How is this code different from using APR arrays, except that the
> latter
> >>> needs a typecast on array item access? As far as I can see, you've
> >>> completely duplicated the APR array allocation strategy, including
> using
> >>> two allocations to create the array.
>
Not exactly: Quite some fraction of the arrays is empty
(20+%,greatly dependent on the command being run).
Those will only do a single allocation.
But you are right that this commit in and of itself does
not make much of a difference. With r1701318, however,
the average number of allocation drops below 1. This is
because re-allocations are extremely rare (1:2M in my
test case, YMMV). If a certain record has more than 4
sub-elements on the protocol side, it is easy to extend
the on-stack buffer size sufficiently to cover that. I'll go
over the *_parse_tuple calls to see where larger structs
are being used, so we get 0 resize except for unbounded
lists. In contrast to APR default array sizes, this will not
increase the amount of memory allocated from the pool.
As a bonus, the combination of r1701317,-8,-22 saves
some memory, slightly less than 100 bytes per array.
This is somewhat relevant for responses that contain
unbounded collections, e.g. a directory listing or a rev's
changed paths list.
> >>> The only significant difference is that capacity is being tracked
> >>> outside the svn_ra_svn__list_t structure during the construction of the
> >>> list.
>
There is also calling overhead for apr_array_push that
alone accounts for about 5% of the protocol processing
(no re-allocation taking place). It all comes down to lots
of piecemeal in a small section of code (one branch of
read_item).
> >>> Call me dense ... but can you please explain how exactly is this
> >>> better/faster than using APR arrays? (I'm not going to mention 'safer'
> >>> because it clearly isn't.) Code like this that is apparently meant be
> an
> >>> optimisation of something(?) really should have a bit of an explanatory
> >>> comment, IMO.
>
Sorry about that. I forgot to replicate the comment from
the original APR-array-based commit. After fixing that,
it now says the same things as I did above.
>> I played around with the apr_array_make implementation a bit and did
> >> some performance measurements with small array allocation and usage,
> >> with the following pattern:
> >>
> >> * in 60% of cases, the array does not get resized
>
And if that gets close to 100% the leverage becomes
even greater.
> >> * in 30% it gets resized once
> >> * in 10% it gets resized twice
> >>
> >> If I change apr_array_make to allocate the initial number of elements in
> >> the same block as the array header, I get a 15% speedup on this test
> >> case, compared to the default implementation. If I change it further to
> >> never set the element values to 0 in apr_array_make, I get an additional
> >> 10% speedup, for a total of 23%. So I'm guessing this is the number you
> >> were actually seeing.
>
Along those lines, yes. I'm getting about 20% for the
whole protocol handling (also see below) in a fully
optimized build. I have not tried to build APR with
similar settings and linking statically to it; I always
used the libs that came with the OS.
> > Is it speedup of overall Subversion over svn:// protocol operation or
> > just of APR array allocation?
>
> Stefan reported a 20% speedup of svn:// protocol processing. My test
> case doesn't measure that, of course.
>
In the case of 'svn diff --summary', protocol processing
is ~50% of the runtime, the remainder is UI processing.
Of that 50%, 20% were shaven off, i.e. a 10% saving
total in this case. If a client does something else with
the data, e.g. only print a total stat or filter it or store
parts of it in some data structure, the relative savings
could be higher.
The bottom line is that any longer response stream that
does not contain user file contents will see similar savings
in the protocol processing part. It is also fair to assume
that most clients won't do much more with the data than
our CL client does in the test case. That is list, log and
diff --summary should be faster now while having a lower
peak memory usage.
The underlying reason for the impact of array handling
in ra_svn is the structure of the protocol itself. Every C
struct and every optional element in it gets enclosed by
"( )", which become a "list" in the parser. More than 40%
of all items can be lists, or one list every 30 bytes or so.
Trunk is now able to send (server), receive and display
(client) such data at a sustained 1Gbps.
-- Stefan^2.
Received on 2015-09-06 15:51:34 CEST