Restore advanced Reader/Writer handling in the Lucene backend

Description

See:

See also occurrences of this JIRA ticket's key in the source code.

This is a complex issue and these configuration options are not completely independent, so I'm keeping all this in a single ticket.

IMPORTANT: let's check with Sanne before working on this.

Reader strategy

not-shared is what we have currently implemented in Search 6, but it probably doesn't make sense anymore. Let's replace it with shared as the default implementation and introduce an option to enable asynchronous readers (default will remain synchronous).

Async worker and commit policy

In Search 5, it was possible to configure each index's "worker" as async.
This implied two things:

  1. The "synchronicity" of automatic indexing: the user wouldn't wait for indexing works to be applied to the index writers upon transaction commit, since everything would happen in a background thread.

  2. The commit policy: the index writer would not commit after each workset, but at a regular time interval set by index_flush_interval (by default 1000ms).

In Search 6, #1 is configurable differently through the automatic indexing synchronization strategy; my point being this part is not relevant anymore.
However, #2 (the commit policy) is still something missing. In Search 6, works are committed immediately after being applied if the mapper requested it (DocumentCommitStrategy.FORCE), or at the end of a batch (i.e. every ~500 worksets or as soon as the work queue is empty, whatever happens first).

We could imagine introducing configuration options regarding how often commits should be executed:

  • every X worksets ("maximum batch size")

  • every X milliseconds ("refresh interval")

I'm not sure how we should present this to the user, however.

Writers and near-real-time

Near-real-time is something we need for Infinispan. However, I think using near-real-time for writes affects:

  • how readers behave (since they have access to the data from the not-yet-committed writer)

  • the commit policy (apparently it's forced to periodic commits?)

We should be careful to structure the configuration in a way that does not even offer incompatible options.
One question is: shouldn't near-real-time be enabled as soon as the user asks for commits to be performed periodically instead of immediately?

Custom index manager

I don't think there's any clear use case for this. Let's drop it and introduce proper SPIs later if we discover a clear use case.

Exclusive index use

hibernate.search.[default|<indexname>].exclusive_index_use

This essentially means we commit, close the index writer and release the locks regularly (~after each workset).

Does this even make sense anymore? Are there legitimate use cases for non-exclusive index use?

Decisions (from last discussion)

Readers/writers

  • Make NRT writer and async reader the default

  • Expose only three settings:

    • readwrite.strategy : nrt/debug

      • Debug is the old “default” writer strategy + the “not shared” reader strategy. Not tested very thoroughly, just for debug, basically you’re on your own if it doesn’t work.

    • readwrite.refresh_interval 0, 1, 10,... (in millis)

      • Default to 0 or 1?

    • readwrite.max_flush_interval 0, 1, 10, … (in millis)

      • We force a write to disk after the interval

      • IMPORTANT: a commit might not be enough in the case of the NRT writer. Might need an extra call to something else?

      • Default to 0

    • Names to be determined

  • IMPORTANT: some utilities related to shared readers ended up in Lucene; let’s try to use these instead of Search 5 code

Automatic indexing synchronization strategy:

Moved to

Custom index manager

This was not discussed. Let's drop support for this.

Exclusive index use

This was not discussed. Let's create a separate ticket:

Environment

None

Assignee

Yoann Rodière

Reporter

Yoann Rodière

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Components

Fix versions

Priority

Major
Configure