Primary/replica setup for the Lucene backend with outbox-polling coordination

Description

In Hibernate Search 5, we used to provide an index replication feature that allowed the Lucene backend to work even in a multi-node application. See for example

In Hibernate Search 6, this replication feature is gone, and that was no big deal in 6.0, since there was no way to reliably send indexing works to the primary node in such a setup, because the JMS/JGroups backends are gone as well.

But in 6.1, we’ve introduced the “outbox-polling” coordination strategy, whose purpose is to redirect indexing works to another node. A specific configuration could allow this strategy to redirect all indexing works to a single “primary” node, and ignore them on “replica” nodes. Then, it would make sense to reintroduce the replication feature, because it would allow the Lucene backend to work correctly in a multi-node application.

Notes

  • We’re only going to support setups where there’s a single primary node that manages all indexes, and all other nodes are replicas. Non-goal: handling scenarios where a each node is the primary node for a single index, but a replica node for other indexes; i.e. where primary indexes are spread over all nodes. This means in particular that sharding in the coordination strategy will not be related in any way to sharding in the Lucene backend.

  • We’re only going to support static setups. Non-goal: handling scenarios where the primary node goes down and another node is dynamically elected. This means in particular that we will not support dynamic sharding in the coordination strategy, only static sharding.

  • Maybe the replication feature should not be a directory provider, but an option of the local-filesystem directory provider? I.e. add properties that make Hibernate Search automatically copy the index to a given target location, or synchronize it from a given source location.

  • Ideally, a “primary” index should perform some checks on startup to make sure the current node is the only node that performs reindexing. For example it could check that all shards are assigned to the current node.

  • Ideally, a “replica” index should explicitly disallow writes (throw exceptions when attempting to write), so as to detect configuration mistakes early (it’s happened before…). See also .

Why

This is important to Wildfly in particular, which includes the JMS backend. Some WildFly users currently use JMS + Lucene + replication, and may not be able to migrate to Elasticsearch easily. By reintroducing replication along with the database-polling coordination strategy, we’re providing an easier migration path: from “JMS + replication” to “database queues + replication”.

Note that ideally, users who need distributed applications really should migrate to an Elasticsearch backend, which would be vastly safer and more robust. Implementing replication for the Lucene backend is more of a temporary bandaid, really.

Activity

Yoann RodièreMarch 19, 2024 at 8:45 AM

Regarding the necessity of a primary/replica setup, see this interesting conversation with insights from Sanne: https://hibernate.zulipchat.com/#narrow/stream/132092-hibernate-search-dev/topic/Lucene.20primary.2Freplica.20setup.20vs.20network.20share

Yoann RodièreMarch 8, 2022 at 11:28 AM

Postponing until we get more feedback, as I haven’t heard of anyone migrating from HSearch 5 + JMS to HSearch 6 and not being able to rely on Elasticsearch.

Details

Assignee

Reporter

Priority

Created October 6, 2021 at 8:20 AM
Updated October 21, 2024 at 12:34 PM