Improve performance MassIndexer through Eager fetching

Description

The MassIndexer fetches all of a certain entity from the database. It then proceeds to index it and the tree of IndexedEmbedded objects is traversed. In doing so, a lot of additional queries get executed. These queries are not really necessary, since we know beforehand that the data is needed for the indexing operation. So it would make sense to eagerly fetch all associations that are marked as indexEmbedded. This way MassIndexer would significantly speed up.

Environment

None

Activity

Show:
Marc Schipperheyn
May 2, 2010, 11:42 PM

I haven't looked at fetch profiles yet. I'll check that out

Marc Schipperheyn
May 3, 2010, 7:26 PM
Edited

After reviewing fetch profiles I was initially very enthusiastic. It seems to be the answer to some of the infuriating issues with n+1.
However, on review it doesn't work or at least not as I would expect. Running MassIndexer on an entity

a method

I see for each MyClass a separate query being executed for getForeign. This does not happen when I map getForeign as a FetchType.EAGER association.

This doesn't seem to just happen with search. I also see this behaviour on other types of hibernate core queries. Don't really understand either why there is no FetchMode for inner joins.

Anyway, this doesn't really apply to Hibernate Search other than that it doesn't look like FetchProfile works for it.

Adrian Meredith
November 4, 2014, 3:36 PM

To use a fetch profile it has to be activated right? The massindexer doesn't know it exists so wont use it. Ideally we would need a new api call in the massindex builder
e.g.
.usingFetchProfile("search")

Marc Schipperheyn
November 4, 2014, 3:52 PM

I'm not sure I'd be happy with that level of finetuning. Would add a lot of tuning and annotations in many areas

Sanne Grinovero
November 4, 2014, 4:24 PM

Correct the Massindexer would need some new configuration options.
A fetch profile would be useful for the second phase, but an easy improvement I was having in mind is to have the first phase - which currently just loads the stream of IDs - to use a custom named query, so people could use a "join fetch" in their query if they want to.

That would only need you to declare the namedquery, and to specify its name on the MassIndexer - with the small API complexity that it can run for multiple types, so you need to specify for which type it should be using the specific named query.

Assignee

Unassigned

Reporter

Marc Schipperheyn

Suitable for new contributors

None

Pull Request

None

Feedback Requested

None

Components

Fix versions

Affects versions

Priority

Minor
Configure