Jakarta Batch Mass Indexing job fails with @IndexedEmbedded entity and MySQL
Description
Activity

Yoann Rodière June 20, 2022 at 2:44 PM
I had a closer look at potential solutions. So far I can see two:
Load IDs in one session, and entities in another.
OR execute one query per “batch” of entities (defining a “batch” as <entityFetchSize> entities, i.e. a partition contains multiple chunks and each chunk contains multiple batches). Something like
select e from MyEntity e where e.id > :idOfLastEntityOfPreviousChunk and e.id < :partitionUpperBoundId order by e.id limit 50
Obviously, there would be an (unknown) performance impact for either solution, and that would impact all users, not just MySQL users. So, that’s annoying.
If we ignore performance, both solutions should work fine… except when we’re told to execute a custom HQL query (see customQueryHQL
/ .restrictedBy(String)
here). In that case we can’t implement the first solution because we can’t (easily) change the query to return IDs instead of entities, and we can’t implement the second solution because we can’t (easily) change the query to add restrictions.
Customizing HQL might become easier with Hibernate ORM 6.1, since it [supports subqueries in the from
clause]( in HQL and thus would allow us to embed HQL in our own HQL, e.g.: select id from (<custom HQL>) e order by e.id
(first solution) or select e from (<custom HQL>) e where e.id > :idOfLastEntityOfPreviousChunk and e.id < :partitionUpperBoundId order by e.id limit 50
(second solution; not sure that would work, though). We could even imagine bringing the “restrictions” feature of the JSR-352 indexer closer to that of the massindexer, allowing only HQL that we insert into our where
clause: users could use subqueries there.
So I would be tempted to postpone the fix this until we switch to Hibernate ORM 6.1 ( ). Then we would be able to implement the fix reliably and consistently regardless of whether custom HQL is provided or not.
Alternatively, if it’s really urgent, we could go with deprecating customQueryHQL
/ .restrictedBy(String)
(which would definitely still not work with MySQL) and introducing another type of loading closer to that of the massindexer, then implementing solution 1 or 2, whatever is simpler. This should work fine with subqueries in custom restrictions (let’s test that, though), but I think (intuitively) solution 1 may perform better with subqueries.

Yoann Rodière April 4, 2022 at 1:48 PM
To the person trying to fix this (probably future me): see the conversation here for an analysis and potential solutions:
Details
Assignee
Yoann RodièreYoann RodièreReporter
Ivan KotevIvan KotevComponents
Sprint
NoneFix versions
Affects versions
Priority
Major
Details
Details
Assignee

Reporter

Hello,
I’m trying to use the JSR-352 Mass Indexing job with MySQL and due to the specifics of the MySQL Connector/J driver, I have specified
entityFetchSize(Integer.MIN_VALUE)
. The indexed entity also has an association which is annotated with@IndexedEmbedded
.Mass Indexing fails with an exception:
Also providing a reproducer: