Ability to disable automatic indexing programmatically at the session level

Description

Automatic indexing is fine for many purposes but in some cases it should be possible to temporarily disable it, e.g. when importing data.

Consider our case:

We have a somewhat complex index which uses indexEmbedded quite a lot, mostly for some text entities as well as quite static entities that need to contribute to the index.

Example:

As you can see, articles have a list of texts as well as a status which itself has a list of texts (descriptions).

During imports, we only load the articles and texts while the status is only referenced by its id. Even if we would load it, we'd not need the texts for the import.

Thus, at the end of the import, the session would contain all articles and article texts that are needed. When the articles are indexed, the indexer will also need the status and status texts and load them.

So far, so good, but here's the problem:

In some cases, we need to run the import in one transaction. For performance reasons we flush and clear the session from time to time, since once imported, the articles and the texts are not need anymore.

At the end of the transaction, however, the indexer tries to load the status and status texts that are referenced by the articles, which due to the flush and clear operations results in lazy initialization exceptions (the articles that are to be indexed are not attached to the session anymore).

In this case it would be better to disable automatic indexing during the import and manually rebuild the index afterwards.

I know that there was a similar request for Hibernate Search 3.x (HSEARCH-387) which was rejected due to lack of clean ways to accomplish this.

However, with Hibernate 4, there might be some way.

A few thoughts on how that could work:

  • Add some property to the session that could be set via EntityManager.setProperty(...)
    a. Use that property to disable the entity listener or indexer or
    b. pass those properties to the EntityIndexListener and let it decide whether to add/update the entity or skip indexing (this might be more flexible but bear less performance)

Currently, there might be a few workarounds:

  • Preload all entities needed for the indexing operation. This might be tedious depending on the complexity and might require much more memory.

  • Use a transient property and examine that in the EntityIndexListener

  • Break up imports into several transactions, but that might not always be an option.

Activity

Show:

Yoann Rodière March 13, 2023 at 2:17 PM

Note the problem described in the issue description:

At the end of the transaction, however, the indexer tries to load the status and status texts that are referenced by the articles, which due to the flush and clear operations results in lazy initialization exceptions (the articles that are to be indexed are not attached to the session anymore).

… is no longer relevant in Hibernate Search 6:

With Hibernate Search 6 (on contrary to Hibernate Search 5 and earlier), this pattern will work as expected:

* with coordination disabled (the default), documents will be built on flushes, and sent to the index upon transaction commit.

* with outbox-polling coordination, entity change events will be persisted on flushes, and committed along with the rest of the changes upon transaction commit.

The use case remains, though: when someone uses a session for bulk imports, they will probably want to disable automatic indexing in that session and perform mass indexing later.

Emmanuel Bernard August 6, 2013 at 7:31 AM

Some idea not properly checked.

Then the underlying disabled session could be extracted from the FullText* objects and stored by FullTextIndexEventListener. The EventSource happens to be that session object and we can do an == comparison.
Now the difficulty is how to keep this list of sessions and have it properly garbaged collected. Since there is not Session close hook event, I supposed we could reuse the same approach used for FullTextIndexEventListener.flushSync.
I am not quite sure why this one is thread safe but I supposed that it's because the same thread which sets the value also reads it. Which is our case here too.

Sanne Grinovero August 5, 2013 at 4:32 PM

thanks! don't hesitate asking if you need pointers. A good starting point is to join the development chat on IRC. See http://hibernate.org/community/irc

Thomas Göttlich August 5, 2013 at 4:26 PM
Edited

Sanne, I'll have a look into that. And you're right: this might need some changes in the ORM.
With them, however, the issues you raised might be solved:

First, Session would need its own properties, which would solve the issue for plain ORM users as well as the listener only getting a session reference.
Next, EntityManager itself is an interface, which is implemented by Hibernate ORM's EntityManagerImpl.
This would allow to implement setProperty(String s, Object o) and pass the property to the session, thus solving the first issue.

I'll think a bit more on how this could be solved without requiring changes to ORM and I'll also look into the patching process

Sanne Grinovero August 5, 2013 at 3:54 PM

Hi Thomas, very appreciated that you're thinking about a solution.
time-wise, unless you help with a patch I don't think we can include this in 4.4 but at least I'll track the need for 5.0

The properties approach is interesting, but EntityManager is not the Session and I'm afraid the way it's structured - being EM a decorator of Session - we have some complexities:

  • properties are stored in the EM, not reachable by Session

  • The eventListener can only grab a reference to Session

  • Even if we could fix the two above issues, we need to find a solution for Hibernate ORM (Session) users as well (not depending on JPA)

If you want to try making a patch I'll be glad to discuss it with you but I fear it might need some chances in ORM too.

Fixed

Details

Assignee

Reporter

Components

Sprint

Fix versions

Priority

Created August 5, 2013 at 1:21 PM
Updated June 2, 2023 at 1:33 PM
Resolved May 9, 2023 at 2:16 PM