ElasticsearchIndexManager must allow or automatically recreate index

Description

When the Hibernate Search session is initialized, org.hibernate.search.elasticsearch.impl.ElasticsearchIndexManager automatically constructs its index in Elasticsearch according to the annotations present on the entities in the index. It then sets its internal indexInitialized value to true. Any subsequent operations (like searches or indexing of new data) is therefore performed as if the index in Elasticsearch exists and is correct.

However, anyone can at any time drop the actual index manually in Elasticsearch by calling DELETE /<index_name> on the Elasticsearch instance. This results in either errors or incorrect behaviour, depending on the Elasticsearch auto_create_index setting (see https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-creation)

If automatic index creation is ON in the Elasticsearch server (the default), then upon receiving a request to add data to an index which does not exist, Elasticsearch will simply create the index based on the data received. This means the index will not comply with any custom annotation details specified in the Hibernate objects (such as, for example, selecting a custom analyzer for a particular field). The index will also omit any fields which are present in the Hibernate entities but null or not present in the data submitted for indexing. This results in somewhat insidious behaviour where everything seems to be working since the index name is correct (and it may actually work for simple cases) but the actual index is not as specified in the entity annotations.

If automatic index creation is off, then adding new data simply fails with an HTTP 404 error (index not found) and Hibernate Search never attempts to recreate the index.

ElasticsearchIndexManager should handle the case when a request to add or search on an index results in a 404 by recreating the index as necessary. Alternatively, or perhaps even simply in addition, some mechanism should be provided for user applications to detect or handle this scenario by manually requesting recreation of an Elasticsearch index.

Activity

Show:

Yoann Rodière March 5, 2020 at 11:38 AM

I will close this ticket since automatic recreation of indexes on errors is not something we can consider at the moment.

However, the following two fixes coming in Hibernate Search 6.0.0.Beta6 should address your use case:

  • As of HSEARCH-3759, we will expose schema management APIs, so that you can create/drop/validate the Elasicsearch indexes and schema manually

  • As of HSEARCH-3751, the mass indexer exposes the (dropAndCreateSchema() option to drop and create indexes and their schema before reindexing.

Yoann Rodière October 31, 2019 at 2:42 PM

I don’t see dropAndCreateIndexes on MassIndexer presently – you were talking about that as a possible feature to add, correct?

Correct. I just created HSEARCH-3751.

Until that's implemented (in Hibernate Search 6), I think the following procedure may work for you:

  1. Drop indexes (indexing/searching will start failing, but from what I understood it's ok for your use case)

  2. Set the lifecycle strategy to create (that's the default).

  3. Upgrade one node to the latest version.

  4. When it's up, you know the indexes were created: upgrade all other nodes to latest version.

  5. Initiate a reindexing from one of the nodes.

Tim Gokcen October 31, 2019 at 1:22 PM

Something like a drop-and-recreate method would be sufficient for our purposes; with Elasticsearch auto-index-creation disabled, the upgrade procedure for a cluster of applications using Hibernate Search on Elasticsearch as a cache/fast-lookup index could be something like:

  1. Upgrade all nodes to latest version

  2. Initiate a reindexing with a drop-and-recreate schema option

I don’t see dropAndCreateIndexes on MassIndexer presently – you were talking about that as a possible feature to add, correct? As far as I can tell the only way to have Hibernate Search recreate the indexes is for them to not be there when the session factory is initialized.

Yoann Rodière October 31, 2019 at 8:03 AM

I guess what makes this situation particular for us is that we are using Hibernate Search on Elasticsearch specifically just as a fast linguistic search database for values which are actually stored elsewhere (in a regular relational database)

Yep, that makes a lot of sense

since we have a clustered application and dynamically add things to the index, restarting to solve the issue is tricky to do with auto-index creation enabled, since by the time one app starts up and initializes Hibernate Search, another still-running app instance may have already auto-created the “bad” index

Unless I'm mistaken, this means the solution of re-creating the index automatically on write would not solve your problem: old-gen applications could re-create the index automatically with the "bad" schema. In fact you would be in an even worse situation: right now you're only in trouble when an old-gen application restarts; with the change, you're in trouble as soon as an old-gen application writes anything to Elasticsearch...

Wouldn't it be more appropriate to allow you to request Hibernate Search to re-create the index explicitly? That way you could do it at the time that makes most sense (only you know when that is) and in the latest version of the application (same, only you know about application versions, not Hibernate Search).
If you're using the MassIndexer, that feature could be built into it: call .dropAndCreateIndexes() instead of .purge() before starting the mass indexer, and it will drop the indexes

However, whatever the solution, there will always be a short time during which indexes will not exist and thus read/writes will all fail. An alternative would be HSEARCH-3499, but it will be significantly more complex to implement (and test).

Tim Gokcen October 30, 2019 at 4:07 PM
Edited

What happened in our particular use case is that we deployed an application with one index configuration, then realized we needed to change that configuration (by specifying non-default analyzers for certain fields). It was our mistaken understanding at the time that simply dropping the index and then requesting a re-indexing of the data would recreate the index anew. Failing that, we also believed that restarting the application would also solve the problem – and restarting does work, but since we have a clustered application and dynamically add things to the index, restarting to solve the issue is tricky to do with auto-index creation enabled, since by the time one app starts up and initializes Hibernate Search, another still-running app instance may have already auto-created the “bad” index.

I guess what makes this situation particular for us is that we are using Hibernate Search on Elasticsearch specifically just as a fast linguistic search database for values which are actually stored elsewhere (in a regular relational database). So for us, dropping the index or the values in the index is basically harmless – the optimized search system just doesn’t work until it’s rebuilt, but no actual data has been lost. I can certainly understand that an application using Elasticsearch as its actual real data store would not expect this situation at all; for them it would absolutely be a fatal error condition from which no automatic recovery is reasonable.

Won't Fix

Details

Assignee

Reporter

Labels

Components

Priority

Created October 30, 2019 at 3:51 PM
Updated March 5, 2020 at 11:38 AM
Resolved March 5, 2020 at 11:38 AM