On 29 April 2014 17:54, Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com> wrote:
> On Mon, Apr 28, 2014 at 8:11 AM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
>> eOn 27 April 2014 19:27, <stefan2_at_apache.org> wrote:
>> > Author: stefan2
>> > Date: Sun Apr 27 15:27:46 2014
>> > New Revision: 1590405
>> > URL: http://svn.apache.org/r1590405
>> > Log:
>> > More 'svn log -g' memory usage reduction. We use a hash to keep track
>> > of all revisions reported so far, i.e. easily a million.
>> Hi Stefan,
>> Interesting findings, some comments below.
>> > That is 48 bytes / rev, allocated in small chunks. The first results
>> > in 10s of MB dynamic memory usage while the other results in many 8k
>> > blocks being mmap()ed risking reaching the pre-process limit on some
>> > systems.
>> I don't understand this argument: why small allocations result 10s of
>> memory usage? Does not pool allocator aggregates small memory
>> allocations to 8k blocks?
> 1M x 48 bytes = 10s of MB. There are two problems
> I'm addressing here for 'svn log -g' (log without -g does
> not have those issues):
> * --limit applies to "top-level" revisions, not the merged ones.
> If you log for some integration branch, it may show only a
> few top-level revs but, say, 100k merged revs. That is fine
> with 1.8 and even more so 1.9 as we deliver the info quickly.
> But the server memory usage should remain in check even
> for more extreme scenarios / repo sizes.
> * Some system provided APR (1.5+ in particular) uses mmap
> to allocate memory. I.e. for every block, e.g. 8k, there is a
> separate mmap call. The Linux default is 65530 (sic!) mmap
> regions per process. Slowly allocating pools can trigger OOM
> errors after only 512MB actual memory usage (sum across
> all threads). I already prepared a patch for that.
Ouch, I didn't know that. I was thinking that MMAP APR pool allocator
is experimental and is not enabled by default.
>> > We introduce a simple packed bit array data structure to replace
>> > the hash. For repos < 100M revs, the initialization overhead is less
>> > than 1ms and will amortize as soon as more than 1% of all revs are
>> > reported.
>> It may be worth implement the same trick like we done with
>> membuffer_cache: use array of bit arrays for every 100k of revisions
>> for example and initialize them lazy. I mean:
>> [0...99999] - bit array 0
>> [100000....199999] -- bit array 1
>> It should be easy to implement.
> I gave it a try and it turned out not too horribly complex.
> See r1590982.
But it may be worth to keep original svn_bit_array and add new
svn_sparse_bit_array() with array of svn_bit_array() objects So things
will be separated in two micro layers.
CTO | VisualSVN | http://www.visualsvn.com
Received on 2014-04-29 16:30:07 CEST