Inefficient Infinispan cache invalidation for bulk operations
Description
Activity
Show:
Vlad Mihalcea October 30, 2017 at 2:02 PM
Applied PR upstream.

Radim Vansa October 25, 2017 at 7:56 AM
Thanks for the confirmation, I'll file appropriate PRs.

Emond Papegaaij October 23, 2017 at 8:02 AM
We've been running a patched wildfly with the changes from https://github.com/rvansa/hibernate-orm/tree/HHH-12036 and non-transactional caches and I can confirm that this fixes our issues with cache invalidation. We haven't seen any timeouts since last thursday.

Radim Vansa October 18, 2017 at 11:24 AM
Actually BulkOperationCleanupAction calls removeAll which could be implemented in a more efficient way, using cache.clear().

Radim Vansa October 18, 2017 at 7:49 AM
I'd say that it's a "won't fix" one. Switching configuration to non-transactional solves this (waiting for confirmation). There are few possible improvements on the logging side, but we'll handle that separately.
Hibernate-infinispan uses a very inefficient way to perform cache invalidation for bulk operations (JPA CriteriaUpdate/CriteriaDelete). Rather than broadcasting a clear to all nodes in the cluster, the cache is cleared entry by entry. As entity caches are invalidating caches by default, this requires a query to all remote nodes first to collect all keys. These keys are then bundled in a very large message and sent out to all nodes again. During this entire procedure, it seems the cache region is locked on all nodes, causing the entire cluster to stall (I presume this is needed to prevent inserts into the cache between the query and the invalidation phase).
We are seeing this behavior on WildFly 10.1.0, 11.0.0.CR1 and 11 master. The correspoding code in Hibernate is:
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/java/org/hibernate/cache/infinispan/access/InvalidationCacheAccessDelegate.java#L144 and
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/java/org/hibernate/cache/infinispan/util/Caches.java#L280
The current implementation makes it impossible to perform batch operations on large cache regions with tens of thousands of entries spanning multiple nodes without blocking the entire cluster for many seconds, even up to a minute. On some places we can change the code to update the entries one by one. However, in other places this will result in thousands of queries to the database in stead of 1, making it far from ideal.
It seems Infinispan lacks a cluster-wide clear command. Therefore, I'll be filing a bug report at Infinispan as well. Note that the documentation of
Cache.entrySet
contains the following sentence: "Use involving execution of this method on a production system is not recommended as they can be quite expensive operations".