Customizable entity loading queries in mass indexer

Description

we have a very large dataset with complex objects. we would like the ability to control how the query is created for the mass reindexing. something like select b.lastname from Entity b. this would allow us to load only relevant fields for the reindexing and dramatically speed up the process. it would be great to be able to plugin out strategy to the HibernateOrmMassEntityLoader

Activity

Show:

masrawiFebruary 1, 2022 at 5:06 PM
Edited

yaml configuration like

 

and programatic

 

 

id loading might be interesting too but in our case not relevant

Yoann RodièreFebruary 1, 2022 at 3:58 PM

We could possibly provide an API similar to org.hibernate.search.mapper.javabean.loading.MassLoadingStrategy (not currently published, only present in the git repo), but adapted to Hibernate ORM. The main questions are:

  • how much customization to allow (customizing the IDs to load might be interesting too, I suppose?)

  • how much customization to mandate (do we want to allow customizing ID loading without customizing entity loading, and vice-versa?)

  • where to provide the strategy (annotations? programmatic mapping? mass indexer options? something else?)

Yoann RodièreFebruary 1, 2022 at 3:47 PM

It seems this does not duplicate after all, since an entity graph would not get rid of formulas (though maybe it should; probably something to report to the ORM project?).

Reopening.

See also the discussion here:

masrawi:

please consider also the ability for us to be able to customize the query needed for the reindexing. in earlier releases we had to overwrite IdentifierConsumerDocumentProducer in order to reduce the time from several hours to several minutes. it would be nice to be able to have the massindexer expose a parameter for just this

yrodiere:

That’s interesting. I may be ignorant of some of the inefficiencies we’re trying to avoid here, but I thought a fetch graph would be enough.

Do you have an example of a use case where a fetch graph (inferred automatically by Hibernate Search) would not be enough to improve loading time and avoid unnecessary loading, while a custom query would?

masrawi:

we use very heavily the formula annotation to add subquery to the entity for example and we don’t need them for reindexing

Yoann RodièreFebruary 1, 2022 at 11:45 AM

Thanks for reporting this, but I’m going to close this ticket as duplicate of , since it’s very similar.

Let’s continue the discussion there

Details

Assignee

Reporter

Components

Fix versions

Priority

Created February 1, 2022 at 11:21 AM
Updated October 7, 2024 at 12:42 PM