On Tue, Apr 29, 2014 at 4:29 PM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> On 29 April 2014 17:54, Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
> > On Mon, Apr 28, 2014 at 8:11 AM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:
> >> eOn 27 April 2014 19:27, <stefan2_at_apache.org> wrote:
> >> > Author: stefan2
> >> > Date: Sun Apr 27 15:27:46 2014
> >> > New Revision: 1590405
> >> >
> >> > URL: http://svn.apache.org/r1590405
> >> > Log:
> >> > More 'svn log -g' memory usage reduction. We use a hash to keep track
> >> > of all revisions reported so far, i.e. easily a million.
> >> >
> * Some system provided APR (1.5+ in particular) uses mmap
> > to allocate memory. I.e. for every block, e.g. 8k, there is a
> > separate mmap call. The Linux default is 65530 (sic!) mmap
> > regions per process. Slowly allocating pools can trigger OOM
> > errors after only 512MB actual memory usage (sum across
> > all threads). I already prepared a patch for that.
> Ouch, I didn't know that. I was thinking that MMAP APR pool allocator
> is experimental and is not enabled by default.
It is not enabled by default, I guess but the
package responsible decided to enable it anyway.
> >> > We introduce a simple packed bit array data structure to replace
> >> > the hash. For repos < 100M revs, the initialization overhead is less
> >> > than 1ms and will amortize as soon as more than 1% of all revs are
> >> > reported.
> >> >
> >> It may be worth implement the same trick like we done with
> >> membuffer_cache: use array of bit arrays for every 100k of revisions
> >> for example and initialize them lazy. I mean:
> >> [0...99999] - bit array 0
> >> [100000....199999] -- bit array 1
> >> ...
> >> It should be easy to implement.
> > I gave it a try and it turned out not too horribly complex.
> > See r1590982.
> But it may be worth to keep original svn_bit_array and add new
> svn_sparse_bit_array() with array of svn_bit_array() objects So things
> will be separated in two micro layers.
I think the new implementation is flexible enough
for a wide range of future usages. I'd prefer having
a single implementation for now as it is simply less
code that could break.
Received on 2014-05-05 01:14:31 CEST