Some of the options mentioned in https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#lucene-indexing-performance
See also occurrences of this JIRA ticket's key in the source code.
This is a complex issue and these configuration options are not completely independent, so I'm keeping all this in a single ticket.
IMPORTANT: let's check with Sanne before working on this.
not-shared is what we have currently implemented in Search 6, but it probably doesn't make sense anymore. Let's replace it with shared as the default implementation and introduce an option to enable asynchronous readers (default will remain synchronous).
In Search 5, it was possible to configure each index's "worker" as async.
This implied two things:
The "synchronicity" of automatic indexing: the user wouldn't wait for indexing works to be applied to the index writers upon transaction commit, since everything would happen in a background thread.
The commit policy: the index writer would not commit after each workset, but at a regular time interval set by index_flush_interval (by default 1000ms).
In Search 6, #1 is configurable differently through the automatic indexing synchronization strategy; my point being this part is not relevant anymore.
However, #2 (the commit policy) is still something missing. In Search 6, works are committed immediately after being applied if the mapper requested it (DocumentCommitStrategy.FORCE), or at the end of a batch (i.e. every ~500 worksets or as soon as the work queue is empty, whatever happens first).
We could imagine introducing configuration options regarding how often commits should be executed:
every X worksets ("maximum batch size")
every X milliseconds ("refresh interval")
I'm not sure how we should present this to the user, however.
Near-real-time is something we need for Infinispan. However, I think using near-real-time for writes affects:
how readers behave (since they have access to the data from the not-yet-committed writer)
the commit policy (apparently it's forced to periodic commits?)
We should be careful to structure the configuration in a way that does not even offer incompatible options.
One question is: shouldn't near-real-time be enabled as soon as the user asks for commits to be performed periodically instead of immediately?
I don't think there's any clear use case for this. Let's drop it and introduce proper SPIs later if we discover a clear use case.
This essentially means we commit, close the index writer and release the locks regularly (~after each workset).
Does this even make sense anymore? Are there legitimate use cases for non-exclusive index use?
Make NRT writer and async reader the default
Expose only three settings:
readwrite.strategy : nrt/debug
Debug is the old “default” writer strategy + the “not shared” reader strategy. Not tested very thoroughly, just for debug, basically you’re on your own if it doesn’t work.
readwrite.refresh_interval 0, 1, 10,... (in millis)
Default to 0 or 1?
readwrite.max_flush_interval 0, 1, 10, … (in millis)
We force a write to disk after the interval
IMPORTANT: a commit might not be enough in the case of the NRT writer. Might need an extra call to something else?
Default to 0
Names to be determined
IMPORTANT: some utilities related to shared readers ended up in Lucene; let’s try to use these instead of Search 5 code
This was not discussed. Let's drop support for this.
This was not discussed. Let's create a separate ticket: