Possible memory leak in Envers due to Narayana Transaction Reaper

Description

Hi,

As requested in the forum (post below) I have created this issue.

In a JTA multi-threaded environment we are working on a project where we are processing large amounts of data. But we ran into a problem where it seems that our software is leaking memory. We analyzed the heapdumps and determined that it was Envers’s AuditProcessManager that was keeping data from being garbage collected.

Perhaps this issue should be posted in the Jboss/Narayana forums. I have been debugging and testing and determined that the cause lies in the fact that the transaction times out and the Transaction Reaper aborts the transaction at a point where Hibernate is not getting a chance to cleanup.

The issue seems to be related and/or originated in the following issue/fix:

I have tried to write it down as simple as possible. The following is happening (Note that the lock I am talking about is the _syncLock in TwoPhaseCoordinator):
Thread: work
Thread: flush. Envers registers beforeCompletion and afterCompletion on Session’s actionQueue (afterCompletion will do the cleanup)
Thread: doCommit
Thread: beforeCompletion -> lock -> Envers does its work
Reaper: abort transaction
Reaper: afterCompletion -> wait for lock
Thread: finds out transaction is rolled back and runs delayedAfterCompletion
Thread: ends beforeCompletion -> unlock
Reaper: gets lock, calls afterCompletion and finds out it is not the original Thread, thus sets delayedAfterCompletion.
Thread: afterCompletion -> wait for lock
Reaper: finishes afterCompletion -> unlock
Thread: gets lock and finds out afterCompletion has already been called (_afterCalled in TwoPhaseCoordinator) -> nothing to do and unlock

Now Envers still has a reference to the audit-work in the Map in AuditProcessManager it is supposed to cleanup during afterCompletion, but it will never be called.

This issues has been found using an older version of Hibernate and Narayana, but I have tested this in a small example application with the latest version of Hibernate (5.2.15.Final) and the latest version of Narayana (5.8.0.Final) and I can reproduce the issue.

I have created a github-repository where I have pushed my test application: https://github.com/ronaldmik/memoryleak

Any help would be appreciated. If this should (also) be posted in the Jboss/Narayana forums, let me know.

Thanks in advance!

Forum post: https://discourse.hibernate.org/t/possible-memory-leak-in-envers-due-to-narayana-transaction-reaper/463

Attachments

1

Activity

Ronald MikApril 18, 2018 at 6:31 AM

The change to #close() will most definitively fix the issue. Thank you for fixing the problem!

Chris CranfordApril 17, 2018 at 2:01 PM

We completely understand and in fact I'll be making a change so that when #close() is called, those delayed AfterTransactionCompletion callbacks fire prior to closing the Session and/or EntityManager.

Ronald MikApril 17, 2018 at 10:40 AM
Edited

Thank you for the thorough investigation into this problem!

The problem is that we are not in fact using the EntityManager in this way. I merely tried to reproduce the problem with as little code as possible using an example from Jboss/Narayana. We are in fact using Spring in combination with Narayana and Hibernate. We have an environment which is, for the most part, message driven through Spring's DefaultMessageListenerContainer. So it is actually this Spring code that is doing the calls to the JtaTransactionManager. Our EntityManager is (of course) injected through the @PersistenceContext annotation. So the code to close the EntityManager is not actually in our own codebase.

But I cannot post the sources of our application publicly and it is also too massive (thousands of classes) to be used as an 'example'. I will start working on an example to recreate the memory leak in a manner that more resembles our environment.

Chris CranfordApril 16, 2018 at 7:28 PM

After reviewing your test case, the first thing which stood out to me was how you handled closing the EntityManager. Your code does this:

One could argue that potentially the EntityManager was never constructed or opened in the try-block for some reason.
With that in mind, I would say a better finally-block would be this:

This has a positive impact on your situation because it guarantees that the AfterTransactionCompletion callbacks fire, thus avoiding the memory leak you describe and illustrate in your test case. The reason is because the #isOpen() method performs some transaction management checks that notice that the callbacks were delayed by the reaper thread and must execute on the original calling thread. Its this call which effectively forces them to fire thus clearing the AuditProcess queue for you.

With that said, this will throw a HibernateException because the underlying SynchronizationCallbackCoordinatorTrackingImpl#processAnyDelayedAfterCompletion will be called and presently informs the original calling thread of the following after executing the AfterTransactionCompletion callbacks.

What you could then do is handle that exception yourself if you anticipate it so it doesn't present an issue as follows:

I could discuss this with the team to see if there are any concerns with us adding the same type of transaction management checks from #isOpen() to #close(). If there is no issue, that would atleast allow those after-completion callbacks to fire to avoid the memory leak you described. But that will still imply a change in your finally block to handle the PersistenceException for your use case.

Ronald MikMarch 30, 2018 at 8:38 AM

I have also added the testcase as an attachment. You can run it by just running main() in QuickStartApplication. I added some logging and through Reflection retrieve the size of the auditProcesses at the end. You can see it returns 1 whereas it should be 0.

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Created March 30, 2018 at 8:31 AM
Updated April 27, 2018 at 1:37 PM
Resolved April 17, 2018 at 6:03 PM