[svn.haxx.se] · SVN Dev · SVN Users · SVN Org · TSVN Dev · TSVN Users · Subclipse Dev · Subclipse Users · this month's index

Re: 1.9 JavaHL memory leak in ISVNRemote#status

From: Branko Čibej <branko.cibej_at_wandisco.com>
Date: Tue, 28 Apr 2015 21:30:31 +0200

On 28.04.2015 20:06, Marc Strapetz wrote:
> On 28.04.2015 18:12, Branko Čibej wrote:
>> On 28.04.2015 18:03, Marc Strapetz wrote:
>>> Hi Brane,
>>>
>>> On 28.04.2015 07:36, Branko Čibej wrote:
>>>> On 24.04.2015 14:11, Branko Čibej wrote:
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> Just a quick note: your last msg jogged my memory and I think I know
>>>>> the root cause of the leak: improper JNI frame management within a
>>>>> loop. If I'm right, I can both fix the leak and remove the
>>>>> close-stream requirement I just added.
>>>>>
>>>>> On 24 Apr 2015 11:00 am, "Marc Strapetz" <marc.strapetz_at_syntevo.com
>>>>> <mailto:marc.strapetz_at_syntevo.com>> wrote:
>>>>>
>>>>> On 24.04.2015 06 <tel:24.04.2015%2006>:34, Branko Čibej wrote:
>>>>>
>>>>> On 22.03.2015 05 <tel:22.03.2015%2005>:06, Branko Čibej
>>>>> wrote:
>>>>>
>>>>> On 21.03.2015 16 <tel:21.03.2015%2016>:23, Branko Čibej
>>>>> wrote:
>>>>>
>>>>> On 19.03.2015 11:43, Marc Strapetz wrote:
>>>>>
>>>>> Attached example performs an endless series of
>>>>> remote status against
>>>>> the Subversion repository. When invoked with
>>>>> -Xmx24M, the VM will run
>>>>> out of memory soon. Monitoring with jvisualvm
>>>>> shows that the used heap
>>>>> size constantly grows. Monitoring with the Task
>>>>> Manager shows that the
>>>>> allocated memory grows even more
>>>>> (significantly).
>>>>> Looks like a memory
>>>>> leak, for which a large amount of native
>>>>> memory is
>>>>> involved, too.
>>>>>
>>>>> Tested on Windows 8.1 with almost latest
>>>>> Subversion 1.9 JavaHL builds.
>>>>>
>>>>> I can confirm that this happens on the Mac, too, and
>>>>> it's not a garbage
>>>>> collector artefact. I'm trying to trace where the
>>>>> leak
>>>>> is happening ...
>>>>> valgrind with APR pool debugging doesn't tell me
>>>>> much
>>>>> (no surprise there).
>>>>>
>>>>> Just to make sure we weren't doing something bad in our
>>>>> libraries, I
>>>>> wrote a small C program that does the same as your Java
>>>>> example (Ev2
>>>>> shims included), and memory usage is completely
>>>>> steady. So
>>>>> it is
>>>>> something in JavaHL, but I have no clue yet what the
>>>>> problem is.
>>>>>
>>>>>
>>>>> I have to say this was one of the more "interesting"
>>>>> bug-hunts
>>>>> in my not
>>>>> entirely boring career, and that's not really obvious from
>>>>> the fix
>>>>> itself. :)
>>>>>
>>>>> http://svn.apache.org/r1675771
>>>>>
>>>>> Marc: this will not be in RC1, but please give the patch a
>>>>> spin and let
>>>>> me know if it fixes your problem. I tested this with the
>>>>> Java
>>>>> program
>>>>> you attached to your original report, and heap size no
>>>>> longer
>>>>> grows
>>>>> without bounds.
>>>>>
>>>>>
>>>>> Great hunt, Brane! The native leak seems to be fixed. I've
>>>>> run my
>>>>> remote status loop with -Xmx24M and still get an OOME after ~170
>>>>> loop iterations. The memory leak is significantly smaller and
>>>>> this
>>>>> time it seems to be in the Java part. According to the profiler,
>>>>> most memory is allocated by HashMap and friends, referenced from
>>>>> JNI code. Only two org.apache.subversion classes show up, but I
>>>>> guess they indicate the source of the leak:
>>>>>
>>>>> org.apache.subversion.javahl.types.Checksum (~10K instances)
>>>>> org.apache.subversion.javahl.types.NativeInputStream (~10K
>>>>> instances)
>>>>>
>>>>> Let me know, if you more profiler statistics will be helpful.
>>>>>
>>>>
>>>> So I've been looking at this in depth. At first I thought that one of
>>>> the problems was that we didn't release JNI local references; I added
>>>> code to make sure this happens in the status callbacks (not committed
>>>> yet) and I verified that all the native wrapped objects do get
>>>> finalized. However, the Java objects still hang around.
>>>>
>>>> One of the problems is that all the callbacks happen within the
>>>> scope of
>>>> the ISVNReporter.finishReport call, which means that the whole edit
>>>> drive is considered a single JNI call (despite the callbacks to Java)
>>>> and the garbage collector can't reclaim space for the objects created
>>>> within JNI during that time. But even a forced GC after the report is
>>>> done and the remote session disposed won't release all the native
>>>> references. I'm a bit stumped here ... JVM's built-in memory profiler
>>>> shows the live references and where they're allocated, but doesn't
>>>> show
>>>> why they're not released even when I explicitly create and destroy JNI
>>>> frames.
>>>
>>> Can you please commit your current state somewhere or send me a patch?
>>> I can give this one a try in JProfiler and see whether I can gather
>>> some more useful information.
>>
>> Here's the complete patch against 1.9.x. I do see some NativeInputStream
>> objects (but not all) being garbage-collected, but there are a number of
>> other objects, even those allocated in Java code in the callback, that
>> are just hanging around, even if I force a GC. Any help in tracking this
>> down will be greatly appreciated.
>
> Thanks, Brane. I'm not sure whether I can confirm this: I took a
> couple of heap dumps while running the remote status 1000 times in a
> loop and my impression was that the amount of almost all objects was
> basically constant, except for Checksum which was constantly rising.
> The only reference to those Checksum objects are "JNI local
> references". The final heap (after 1000 iterations) looks like this:
>
> http://imgur.com/gLdnv84
>
> As you can see, there are no "contents" InputStreams nor "properties"
> (Hash)Maps referenced. AFAIU, I should see almost the same amount of
> objects in all three cases?

Yes, we create the same number of Checksums and NativeInputStreams and
it seems that I did manage to clear the references to the streams, which
is why they're now getting garbage-collected. I've not been able to
figure out why the same is not happening to Checksums (and all the
objects allocated for the remote status events).

I'll keep digging. This is probably some really silly omission
somewhere, it's just a matter of finding it.

-- Brane
Received on 2015-04-28 21:31:48 CEST

This is an archived mail posted to the Subversion Dev mailing list.

This site is subject to the Apache Privacy Policy and the Apache Public Forum Archive Policy.