Substantial native memory leak when bootstrapping EntityManagerFactory

Description

== SUMMARY ==
When bootstrapping an EntityManagerFactory and persistenceUnitRootUrl is set, Hibernate scans classes inside the fat jar. While doing so, an excessive amount of memory external to the JVM is leaked.

== HOW TO REPRODUCE ==
Reproduced on this system, but you will get a similar effect on MAC OS X, etc.:
java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
Ubuntu 16.04 LTS

I attached a maven project that showcases how someone would usually bootstrap Hibernate with Spring in a containerless environment.
Make sure to have an arbitrary (in my case MySql) server running somewhere; check out the config in class "SpringConfiguration".

Execute "run.sh" from the attached test case. It will build a fat jar using the maven shade plugin. It will then start the jar with Xmx=Xms=64m. The shell script then monitors the process' resident set memory size.
Expected behavior: We should see a resident set size of 64mb + (some jvm overhead) = ~130mb. Instead, we observe a resident set size of 400mb. The issue is further illustrated by the output in log.txt.
On my system, we get "Total: reserved=1450677KB, committed=158661KB"; thus total committed memory is about 150mb and the remaining 250mb are 'lost' outside the JVM.

== PROBLEM EXTENT ==
The attached toy example already results in a leak of ~250MB (depending on the system). We maintain an application in production that has a larger jar and uses multiple EMFs at the same time; in this case, we noticed a leak of 2.5 GB!

== ANALYSIS ==
Method org.hibernate.jpa.boot.archive.internal.JarFileBasedArchiveDescriptor#visitArchive(ArchiveContext context) is used to scan the jars.
This results in repeated invocations of org.hibernate.jpa.boot.archive.internal.ArchiveHelper#getBytesFromInputStreamSafely(InputStream inputStream). The method reads the bytes from the InputStream to a byte array,
however, the GZIPInputStream is never closed explicitly. From my understanding, this causes the leak, because the GZIP decompression is done via JNI. Thus, memory outside the JVM is allocated and never released.

Environment

java version "1.8.0_91"
Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
Apache Maven 3.3.9
Ubuntu 16.04 LTS

Status

Assignee

Vlad Mihalcea

Reporter

Fabian Sudau

Fix versions

backPortable

Backport?

Suitable for new contributors

None

Requires Release Note

None

Pull Request

None

backportDecision

None

Components

Affects versions

Priority

Critical
Configure