Re: svn commit: r1590405 - in /subversion/trunk: build.conf subversion/include/private/svn_subr_private.h subversion/libsvn_repos/log.c subversion/libsvn_subr/bit_array.c

From: Stefan Fuhrmann <stefan.fuhrmann_at_wandisco.com>
Date: Tue, 29 Apr 2014 15:54:49 +0200

On Mon, Apr 28, 2014 at 8:11 AM, Ivan Zhakov <ivan_at_visualsvn.com> wrote:

> eOn 27 April 2014 19:27, <stefan2_at_apache.org> wrote:
> > Author: stefan2
> > Date: Sun Apr 27 15:27:46 2014
> > New Revision: 1590405
> >
> > URL: http://svn.apache.org/r1590405
> > Log:
> > More 'svn log -g' memory usage reduction. We use a hash to keep track
> > of all revisions reported so far, i.e. easily a million.
> >
> Hi Stefan,
>
> Interesting findings, some comments below.
>
> > That is 48 bytes / rev, allocated in small chunks. The first results
> > in 10s of MB dynamic memory usage while the other results in many 8k
> > blocks being mmap()ed risking reaching the pre-process limit on some
> > systems.
> I don't understand this argument: why small allocations result 10s of
> memory usage? Does not pool allocator aggregates small memory
> allocations to 8k blocks?
>

1M x 48 bytes = 10s of MB. There are two problems
I'm addressing here for 'svn log -g' (log without -g does
not have those issues):

* --limit applies to "top-level" revisions, not the merged ones.
  If you log for some integration branch, it may show only a
  few top-level revs but, say, 100k merged revs. That is fine
  with 1.8 and even more so 1.9 as we deliver the info quickly.
  But the server memory usage should remain in check even
  for more extreme scenarios / repo sizes.

* Some system provided APR (1.5+ in particular) uses mmap
  to allocate memory. I.e. for every block, e.g. 8k, there is a
  separate mmap call. The Linux default is 65530 (sic!) mmap
  regions per process. Slowly allocating pools can trigger OOM
  errors after only 512MB actual memory usage (sum across
  all threads). I already prepared a patch for that.

> We introduce a simple packed bit array data structure to replace
> > the hash. For repos < 100M revs, the initialization overhead is less
> > than 1ms and will amortize as soon as more than 1% of all revs are
> > reported.
> >
>
> It may be worth implement the same trick like we done with
> membuffer_cache: use array of bit arrays for every 100k of revisions
> for example and initialize them lazy. I mean:
> [0...99999] - bit array 0
> [100000....199999] -- bit array 1
> ...
>
> It should be easy to implement.
>

I gave it a try and it turned out not too horribly complex.
See r1590982.

> This also improves cases for repositories like ASF: there are many
> revisions, but usually only most recent revisions are accessed.
>

Thanks for the feedback!

-- Stefan^2.
Received on 2014-04-29 15:55:34 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]