Read-only indexes

Description

In some use cases, indexes are populated by one instance of Hibernate Search, and accessed in read-only mode by another instance. We’ve had reports of such setups being difficult to implement, due to the assumption from Hibernate Search that indexes are never written to by someone else: https://stackoverflow.com/questions/78154275/how-do-i-reload-the-index-before-searching-in-hibernate-lucene/78154572#78154572 . While those assumptions are reasonable, they are not exactly necessary: what we need to assume is that at most one Hibernate Search instance will write to the backend. In theory, it could be fine to have multiple Hibernate Search instances reading from the same index, as long as only one writes to it. Related: https://hibernate.atlassian.net/browse/HSEARCH-4338

Also, some people are actually building indexes at build time, and using Hibernate Search at runtime exclusively to read from the indexes: https://stackoverflow.com/questions/79096813/hibernate-search-6-x-how-can-i-make-the-lucene-index-directory-to-readonly . This could be especially useful for applications using the standalone pojo mapper and Lucene backend, that rely on an infrequently updated data source: they could just bundle the indexes with the app, and rebuild the app/indexes regularly, or on every update of the datasource…

To address these use cases, we might want to introduce some read/write modes in backends, so that they come pre-configured with the right defaults:

  • read-write: the current defaults.

  • read-only:

    • Hibernate Search’s ORM integration does not auto-register listeners for the corresponding entities.

    • Attempts to write to indexes explicitly (add, update, delete) lead to an exception.

    • The schema management strategy defaults to “validate”, and setting anything other than “validate” and “none” leads to an exception on startup.

    • Write-related resources (threads, …) do not even start for relevant indexes – in particular for outbox polling.

    • Lucene indexes use the “none” locking strategy, and selecting any other strategy will fail.

    • Lucene’s IO strategy performs a refresh before a search, even if no write happened recently (we assume someone else might have written to the index).

    • … more?

I suspect this could be useful as an index-level setting (with the possibility to set the defaults for all indexes at the backend level), so that some indexes are read-only and others are read-write. But that may be complex to implement, so a global setting could work, as a start.

NOTE:

  1. One could imagine an even stronger read/write mode, “immutable”, where everything is read-only and we don’t expect indexes to change, even through another app instance, so we never perform refreshes automatically before search. But that can arguably be achieved with the read-only mode plus some performance tuning (IO strategy), so it’s probably not worth a dedicated mode.

  2. Ideally the mode should be configurable at runtime in Quarkus. This means we need to delay reading the configuration as much as possible. From a quick look through the impact of the read-only mode, this seems reasonable.

Activity

Details

Assignee

Reporter

Priority

Created October 21, 2024 at 12:32 PM
Updated October 21, 2024 at 12:42 PM

Flag notifications