Re: Memory consumption w/ large directories

From: Kirby C. Bohling <kbohling_at_birddog.com>
Date: 2002-02-08 03:24:44 CET

Greg,

I'm not sure how portable all of it is. It needs some specific patches
that I only know to work on x86 Linux with a relatively recent glibc,
and have hints it might work on other glibc platforms, and uses stuff
from binutils but I will happily contribute it.

To be honest it is 2 patches (1 to apr and 1 to subversion), and a 3 or 4
scripts (2 python and a 1-2 shell). Let me know what it is I need to
do. Some of it needs a bit of refinement as it is only in my
.bash_history file :-). I also need to write a doc or README on how I
have been memory hunting too.

Some of the apr stuff Sander liked and I believe considering
re-implementing into APR (walking the pool tree), and some stuff that
allowed me tag specific pools I am "interested" in tracing (I didn't get
a feel for Sander's opinion after he read the code, he seemed interested
before seeing the implementation *grin*).

Minor rant to mainly to Sander:
<rant>
I don't like having to use grep, as I find piping the output of console
based gdb very difficult to pipe into grep. Somethings I would grep are
based soley on memory addresses of pools, which could in theory change
from run to run, and I hate tracking them down. That is the reason for
the apr_pool_trace_alloc() stuff. IMHO it should be extented to have
debugging flags that you can set using calls that are macro'ed to no-ops
in production code so you can say things like this:

I am interested in allocs in this and all set the I'm interested bit on
all subpool created from this pool.

I interested in seeing a backtrace on allocations from this pool.

I interested in a when a new pool gets created from this pool.

For bonus points I am interested in pools whose tag match this regex.

I'll provide the implementation if your interested. Digging through the
output of this in verbose mode was impossible. I knew the pool I was
interested in and that was the only one I wanted. The total memory
allocated and deallocated in subversion is _*HUGE*_. It goes thru
memory allocations at an incredible rate (thanks to the speedy pools).
Trust me the aggregate per file allocations completely dwarf per
directory allocations, the only reason I noticed it was because the
lifetime of the objects was wrong.
</rant>

Wow, I feel better now. Sorry if that came off too strong I think the
APR stuff is pretty cool. I am trying to figure out how to either mimic
or use the pooling code at work. It is like some old Objective C code I
used once in a former life that was easy, memory management was a no
brainer back then.

There is some get a backtrace functionality just crammed right where it
shouldn't be in apr, then a couple of other little tweaks in
subversion/libsvn_*/*.c to allow me to track the pools I am interested in.

Kirby

Greg Stein wrote:
> On Thu, Feb 07, 2002 at 07:19:14PM -0600, Kirby C. Bohling wrote:
>
>>Okay,
>>
>> After a bit of tool/patch building, I have a set of tools that let me
>>track down the worst offenders of memory consumption in the subversion.
>> I don't know enough about object lifetimes to fix it to be honest.
>>
>
> That's what we're here for :-)
>
>
>>...
>>This back trace allocated 2.3MB during a check in of 208 files in a
>>single directory. It is the pool that gets big fast, and I am pretty
>>sure it is all related to a little detail about 5 calls deep. This is
>>just the worst backtrace. My little set of scripts, will give you a
>>
>
> I see the problem.
>
> libsvn_wc/get_editor.c::add_or_open_file(). Each time that bugger is called,
> it calls svn_io_get_dirents() using the directory baton. Then it calls
> svn_wc_entries_read() to read the entries file (also using the dir baton).
> It calls svn_wc_check_wc(), but that probably won't consume much memory.
>
> In any case, those three items are all done in the directory pool. After
> that, it will create the file baton, which has its own subpool.
>
> The logic should be changed to create the subpool *first*, do the operations
> in that subpool, and then pass that to make_file_baton().
>
>
>>...
>> Which means that the directory pool is used a *lot* while adding
>>several hundred files. And the entry file gets bigger every time a file
>>is added, which explains the non-linear behavior. I believe this to be
>>the root cause of all of the memory issues I have been seeing. I know
>>
>
> I would think so, yah.
>
>
>>there are several pieces slated for rewrites/reworks. This is an issue
>>that should be considered in all of that.
>>
>
> This one wasn't on deck :-)
>
> After I get through some weird borkenness, and fix some config problem Karl
> is seeing, then I'll fix that function.
>
>
>>By the way, this is the worst of 14 stack traces which all use over
>>0.5MB of memory on the files, 7 of those use over 1MB and 2 of those use
>> over 2MB, which for a total means something like 14-15MB of memory
>>just to do the check in, and there are 186 unique backtraces that
>>allocate memory from a dir_baton created in libsvn_wc.
>>
>
> Neat statistics!
>
>
>>So, now I have a couple of scripts and various other things lying about
>>as patches. I will happily give them out, what is the best way to do
>>that? I don't have any file space on a public server, and I figure
>>dumping a tar.gz file as an attachment into everybody's inbox won't
>>endear me to too many people. I can attach it to issue 622, and I
>>should probably post all this information there to. Hmm, that is what I
>>will do unless somebody tells me that is a bad idea.
>>
>
> How about we create /trunk/tools/dev, and you commit them into there?
>
> [ theoretically, some of this stuff is also helpful for APR, but we can sort
> that out later ]
>
> Cheers,
> -g
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org
Received on Sat Oct 21 14:37:05 2006

This message: [ Message body ]
Next message: Greg Stein: "Re: svn commit: rev 1211 - trunk/subversion/libsvn_subr trunk/subversion/tests/libsvn_subr"
Previous message: Karl Fogel: "Re: svn commit: rev 1207 - trunk/subversion/include trunk/subversion/libsvn_client"
In reply to: Greg Stein: "Re: Memory consumption w/ large directories"
Next in thread: Sander Striker: "RE: Memory consumption w/ large directories"
Reply: Sander Striker: "RE: Memory consumption w/ large directories"

Contemporary messages sorted: [ By Date ] [ By Thread ] [ By Subject ] [ By Author ] [ By messages with attachments ]