In cases when you need to fetch collections eagerly within application code but none of these collections are indexed, it would save a lot of indexing time if only columns and collections needed for indexing were fetched.
In my case, indexing 530 000 entities consisting of 40 columns and 7 collections, it takes around 1 hour to index when collections are eagerly fetched, but only 3 minutes if collections are not fetched.
If it was possible to only fetch needed columns and collections the build index performance would improve a lot in these specific cases.
What is your domain model and how is it configured? If I understand correctly you are mapping some association for eager fetching, since this is "needed". On the other hand you want lazy fetching for indexing. How do you envision this should work? You want to change the association fetching strategy just for indexing? Are you talking about the mass indexer here, btw?
Out of interest, why do you need to load these associations eagerly in your application? Lazy loading works often quite well. I am just trying to understand your use case better.
Last but not least, we have been talking about utilizing fetch plans for association indexing. In this case you would define a fetch plan for indexing purposes and somehow configure this fetch plan with Search. Is this something which would work for you?
I agree with this, it has been on my personal wishlist for a long time. Thanks for raising the subject again.
Though consider as well that we really think people should use "all lazy" relations in almost all cases. It's very easy to turn a relation "eager" dynamically when a specific code section benefits from it, and it's very rare for this to have any beneficial effect on all your usage.
So this warrants the question: are you sure you're not just having some eager relations as a workaround for other problems?
For example, I often see people using EAGER aggressively because they want to detach (or close the Session) early on, for questionable reasons.
As Hardy suggests, it would be very useful for us to get some more details of which possible solutions would work best in your use case.
I think that ideally we should be able to fully disable the eager loading from the indexing plan, but still often the user could know better on what is more effective to load eagerly or not so a possible evolution would be to have such a fetch plan being defined for indexing only.
The big limitation in all of this, is that Hibernate ORM doesn't currently allow to "demote" a fetch plan to lazy when it's mapped as eager, and that is a very complex intimidating patch. So better get the requirements right before we start hacking
Avoiding to fetch specific columns is a different subject, with its very own complexities; for one, that's also not possible currently. The main issue though is that we don't really know which fields the user will need to create the indexed model, for example when using custom bridges or dynamic Analyzers, the current API exposes all of the entity to the extension points provided by the user.
So: yes we could do that, but not automatically unless we change the API to be more restrictive.
My envision is if it could work in a similar way as projections work, it only fetches what is needed for indexing. If that's possible for you to implement I do not know. Yes I'm talking about the mass indexer and I want it to only fetch needed columns and collections when building the index to avoid unnecessary database calls.
I want to load the associations eagerly because of historical reasons, it is the way it has been done previously and it is hard to predict and test what consequences it will have to change the behaviour. So the current reason is to minimize risks.
Fetch plans for association indexing sounds interesting. If it would make it stop loading the associations it would work for me.
If I have missed to answer any of your questions, they were quite many and I think some of them touch the same subject, just tell me what more you want to know.
I think it can be good to know that this ticket arose from following question on Stackoverflow: http://stackoverflow.com/questions/28500459/can-hibernate-search-index-build-performance-be-improved-using-projection
I did not want to say in this ticket that projections should be used since there may be some other way to solve it.