Fetch only fields needed for indexing during index build

Description

In cases when you need to fetch collections eagerly within application code but none of these collections are indexed, it would save a lot of indexing time if only columns and collections needed for indexing were fetched.

In my case, indexing 530 000 entities consisting of 40 columns and 7 collections, it takes around 1 hour to index when collections are eagerly fetched, but only 3 minutes if collections are not fetched.

If it was possible to only fetch needed columns and collections the build index performance would improve a lot in these specific cases.

Activity

masrawiApril 20, 2023 at 7:48 AM

is there any update on this issue?

Yoann RodièreFebruary 1, 2022 at 3:50 PM
Edited

That being said, with bytecode enhancement enabled, I believe it’s possible to mark even basic fields (e.g. a string) as lazy, in which case the fetch graph may allow us to get rid of basic fields that we don’t need in the loading query.

I just checked: that does not work. Fetch graphs do not affect the loading of basic attributes or formulas, except if they are lazy and part of a non-default LazyGroup (in which case they wouldn’t be loaded in the current implementation either, so it doesn’t matter).

So, if we were to automatically apply entity graphs, it would only affect associations.

Yoann RodièreFebruary 1, 2022 at 3:48 PM

we use very heavily the formula annotation to add subquery to the entity for example and we don’t need them for reindexing

Right. That definitely won’t be addressed by fetch graphs (though that’s unfortunate). I’ll reopen your ticket HSEARCH-4471; we’ll need a different solution for you.

masrawiFebruary 1, 2022 at 1:56 PM
Edited

we use very heavily the formula annotation to add subquery to the entity for example and we don’t need them for reindexing

Yoann RodièreFebruary 1, 2022 at 1:44 PM

That’s interesting. I may be ignorant of some of the inefficiencies we’re trying to avoid here, but I thought a fetch graph would be enough.

Do you have an example of a use case where a fetch graph (inferred automatically by Hibernate Search) would not be enough to improve loading time and avoid unnecessary loading, while a custom query would?

masrawiFebruary 1, 2022 at 1:29 PM

please consider also the ability for us to be able to customize the query needed for the reindexing. in earlier releases we had to overwrite IdentifierConsumerDocumentProducer in order to reduce the time from several hours to several minutes. it would be nice to be able to have the massindexer expose a parameter for just this

Yoann RodièreFebruary 1, 2022 at 11:57 AM

I think in Hibernate Search 6.x it should be possible to infer a fetch graph from Hibernate Search’s knowledge of the fields accessed during indexing (the dependency tree). Then we could use that fetch graph when mass indexing.

I think by default this would only provide improvements for collections (only those we need would be loaded). That’s probably fine, since I believe collections (more rows) generally carry a higher cost, not singular attributes (more columns).

That being said, with bytecode enhancement enabled, I believe it’s possible to mark even basic fields (e.g. a string) as lazy, in which case the fetch graph may allow us to get rid of basic fields that we don’t need in the loading query.

Magnus HovénFebruary 18, 2015 at 10:28 AM

My envision is if it could work in a similar way as projections work, it only fetches what is needed for indexing. If that's possible for you to implement I do not know. Yes I'm talking about the mass indexer and I want it to only fetch needed columns and collections when building the index to avoid unnecessary database calls.

I want to load the associations eagerly because of historical reasons, it is the way it has been done previously and it is hard to predict and test what consequences it will have to change the behaviour. So the current reason is to minimize risks.

Fetch plans for association indexing sounds interesting. If it would make it stop loading the associations it would work for me.

If I have missed to answer any of your questions, they were quite many and I think some of them touch the same subject, just tell me what more you want to know.

I think it can be good to know that this ticket arose from following question on Stackoverflow: http://stackoverflow.com/questions/28500459/can-hibernate-search-index-build-performance-be-improved-using-projection

I did not want to say in this ticket that projections should be used since there may be some other way to solve it.

Sanne GrinoveroFebruary 18, 2015 at 10:17 AM

I agree with this, it has been on my personal wishlist for a long time. Thanks for raising the subject again.

Though consider as well that we really think people should use "all lazy" relations in almost all cases. It's very easy to turn a relation "eager" dynamically when a specific code section benefits from it, and it's very rare for this to have any beneficial effect on all your usage.
So this warrants the question: are you sure you're not just having some eager relations as a workaround for other problems?
For example, I often see people using EAGER aggressively because they want to detach (or close the Session) early on, for questionable reasons.

As Hardy suggests, it would be very useful for us to get some more details of which possible solutions would work best in your use case.

I think that ideally we should be able to fully disable the eager loading from the indexing plan, but still often the user could know better on what is more effective to load eagerly or not so a possible evolution would be to have such a fetch plan being defined for indexing only.

The big limitation in all of this, is that Hibernate ORM doesn't currently allow to "demote" a fetch plan to lazy when it's mapped as eager, and that is a very complex intimidating patch. So better get the requirements right before we start hacking winking face

Avoiding to fetch specific columns is a different subject, with its very own complexities; for one, that's also not possible currently. The main issue though is that we don't really know which fields the user will need to create the indexed model, for example when using custom bridges or dynamic Analyzers, the current API exposes all of the entity to the extension points provided by the user.
So: yes we could do that, but not automatically unless we change the API to be more restrictive.

Hardy FerentschikFebruary 18, 2015 at 9:19 AM

What is your domain model and how is it configured? If I understand correctly you are mapping some association for eager fetching, since this is "needed". On the other hand you want lazy fetching for indexing. How do you envision this should work? You want to change the association fetching strategy just for indexing? Are you talking about the mass indexer here, btw?

Out of interest, why do you need to load these associations eagerly in your application? Lazy loading works often quite well. I am just trying to understand your use case better.

Last but not least, we have been talking about utilizing fetch plans for association indexing. In this case you would define a fetch plan for indexing purposes and somehow configure this fetch plan with Search. Is this something which would work for you?

Details

Assignee

Reporter

Components

Fix versions

Priority

Created February 18, 2015 at 8:38 AM
Updated September 25, 2023 at 3:53 PM

Flag notifications