Re: svn commit: r1590405 - in /subversion/trunk: build.conf subversion/include/private/svn_subr_private.h subversion/libsvn_repos/log.c subversion/libsvn_subr/bit_array.c

From: Ivan Zhakov <ivan_at_visualsvn.com>
Date: Mon, 28 Apr 2014 10:11:35 +0400

eOn 27 April 2014 19:27, <stefan2_at_apache.org> wrote:
> Author: stefan2
> Date: Sun Apr 27 15:27:46 2014
> New Revision: 1590405
>
> URL: http://svn.apache.org/r1590405
> Log:
> More 'svn log -g' memory usage reduction. We use a hash to keep track
> of all revisions reported so far, i.e. easily a million.
>
Hi Stefan,

Interesting findings, some comments below.

> That is 48 bytes / rev, allocated in small chunks. The first results
> in 10s of MB dynamic memory usage while the other results in many 8k
> blocks being mmap()ed risking reaching the pre-process limit on some
> systems.
I don't understand this argument: why small allocations result 10s of
memory usage? Does not pool allocator aggregates small memory
allocations to 8k blocks?

>
> We introduce a simple packed bit array data structure to replace
> the hash. For repos < 100M revs, the initialization overhead is less
> than 1ms and will amortize as soon as more than 1% of all revs are
> reported.
>

It may be worth implement the same trick like we done with
membuffer_cache: use array of bit arrays for every 100k of revisions
for example and initialize them lazy. I mean:
[0...99999] - bit array 0
[100000....199999] -- bit array 1
...

It should be easy to implement.

This also improves cases for repositories like ASF: there are many
revisions, but usually only most recent revisions are accessed.

What do you think?

-- 
Ivan Zhakov
CTO | VisualSVN | http://www.visualsvn.com

Received on 2014-04-28 08:12:35 CEST

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]