Zero-downtime/hot schema updates for the Elasticsearch backend

Description

Elasticsearch Alias, allows me to re-index without disconnecting the clients.

For example, I have an e-commerce that connects directly to elasticsearch.

I use hibernate-search in my erp to index my records from the database (100000 records).

But when doing reindex, hibernate-search deletes the records and inserts them again.

It would be cool to have a unique option to recreate the data in another index and after it has finished pointing to the alias. Allowing zero downtime.

Example:

On create:
1- my_index(alias) — > my_index_v1
2- add new records (100000 records) in my_index_v1

On update (reindex):
1- my_index(alias) — > my_index_v1
2- add new records (100000 records) in my_index_v2
3 - change my_index(alias) — > my_index_v2
4 - remove and delete my_index_v1

In the documentation says this:

value:update

The index, its mappings and analyzer definitions will be created, existing mappings will be updated if there are no conflicts. Caution: if analyzer definitions have to be updated, the index will be closed automatically during the update.

Maybe a property like this:

and

Index Aliases (Guide):
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

Index Aliases and Zero Downtime (Reference):
https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html

Environment

None

Activity

Show:
Yoann Rodière
August 28, 2017, 11:37 AM
Edited

My two cents... , you may want to react to this.

I'm not sure it's a good idea. I'm not saying the "alias" feature in Elasticsearch is useless, but in the case of Hibernate Search, I don't think aliases are enough.

Why it's not a good idea

Main problem: you need to coordinate upgrades

The main reason is, you have to upgrade more than just the index. There's a database, your frontend, your backend, etc., and if you really want zero downtime, you will have to coordinate the upgrades in just the right way. Each component depends on another, and you cannot always make all components work with both the older and newer version of their dependencies.

For instance, let's imagine you do your reindexing, and you want zero downtime. You will probably have application nodes working on the previous version of the index while doing the reindexing. Those nodes will expect the index to be structured in a certain way. If your reindexing process is fully hidden inside Hibernate Search, and if we use aliases, the older index will get replaced by the newer one at some point (you don't know when exactly). When this happens, your "old" application nodes will start failing: the structure of the index changed suddenly, and they received a string result where they expected an integer!

You could say "Hibernate Search should handle multiple mappings simultaneously, we'll just upgrade all our nodes first and then do the reindexing". but that's quite complex, including for Hibernate Search users (see "Working with two mappings simultaneously" below).

So you need, somehow, to be able to take down the non-upgraded application nodes before the aliases are changed.

And then you start wondering what sort of benefit hiding aliases inside mass indexing would bring to you, since you need to execute operations between the actual reindexing and the alias change...

Secondary problem: document updates during indexing should not be lost

There is another reason aliases are not enough: during reindexing, you don't want document updates to be lost. You want Hibernate Search to continue sending updates to the index according to your database updates, just in case an entity is updated between when it has been reindexed and when we switch from the old index to the new one. You could simply keep sending those updates to the old version of the index, but then those updates will ultimately be lost when switching to the new version of the index.

So you will expect Hibernate Search to somehow send the updates to the new index. But then you will have to provide the name of this new index beforehand: Hibernate Search may not be aware of it, since the reindexing may be happening on another application node.

And then you start wondering what sort of benefit hiding aliases would bring to you, since you need to provide the "low-level" name of the targeted index to Hibernate Search anyway...

Solution?

The only "simple" way I know to avoid the issues is to orchestrate the index upgrade on a per-node basis:

  • keep at least one non-upgraded application node online to handle read operations

  • spawn upgraded application nodes for write operations and reindexing, not exposed to the outside world, targeting a new, unaliased index (e.g. my_index_v2 instead of the alias my_index).

  • switch the non-upgraded nodes to read-only mode and redirect all write operations to the upgraded nodes

  • start reindexing

  • once reindexing is complete, take the upgraded nodes fully online and take the non-upgraded application nodes offline (to upgrade and restart them)

  • and only then switch back the upgraded nodes from targeting the low-level my_index_v2 name to simply targeting the alias, my_index.

But then, using aliases is just cosmetics. You can see that the last step is not strictly necessary, everything would still work without it. Also, they introduce new fancy ways to shoot yourself in the foot, like switching back to aliased names for only half of the indexes.

Maybe we could simply provide an configuration option to add a suffix to every index, making it easy for you to change it from _v1 to _v2 when you do zero-downtime upgrades. But that's about the only thing users would need, in my opinion.

Implicit features

Your ticket implicitly requests the following features:

  1. Reindexing on startup

  2. Working with two mappings simulatenously (the new one, and the one currently in the index)

Below are some details as to why they would be problematic.

Reindexing on startup

Reindexing may take hours, and in most cases it will take a few minutes at least. It's just way too long, especially considering that most applications will wait for Hibernate ORM (and thus Search) to be initialized before starting other components on the server, such as the REST services or the web application framework. Such a long startup would lead to timeouts, or to web applications that are unavailable until reindexing is done, which defeats the whole purpose. As explained above, it's probably better if Hibernate Search users can orchestrate the upgrade on their own, after Hibernate Search started.

Working with two mappings simulatenously

You scenario implicitly requires Hibernate Search to deal with both the old mapping (before upgrade) and the new mapping (after upgrade). This is much more complex that it seems.

First, we would need metadata about the old mapping. Elasticsearch metadata is not always enough to allow Hibernate Search to perform queries, we need the mapping metadata too (in particular to know which bridges are used). The old mapping metadata is obviously not available once you changed your mapping (removed @Field annotations, changed some other, ...), so the Hibernate Search user would have to provide it explicitly, somehow.

Second, you may have changed your entity model (added/changed/removed properties from your entities), so we wouln't need the old mapping exactly, but a mapping from your new entity model to your old index model. It would require quite some work from you, as a Hibernate Search user.

Leandro Kersting de Freitas
August 28, 2017, 2:16 PM

Hi ,

Thanks for the feedback.

I thought of this feature so we could further eliminate the external manual process and allow that hibernate-search to manage this.
But you're very right, this external process is needed anyway and making hibernate-search control this would not be good.

A real example, is in my current development project with wildfly I am evaluating how to do a manual deployment, make blue-green for this.

step 1 - In the first version in production my index will be called for example:

step 2 - I will index through a manual trigger;

step 3 - I will create an alias manually for my_index;

step 4 - When a new version is released with the changed mappings, filters and parsers, I'm going to change the name of my index manually for my_index_v2;
Of course all this in a single Node.

step 5 - I will index through a manual trigger;

step 6 - I make a blue-green of the index. I change manually the alias "my_index" to index "my_index_v2";

step 7 - After everything is OK, I make blue-green with my load balancer, pointing to the new node.

And so on in the next versions.

I thought of doing this inside hibernate-search, but as you said, that responsibility should not be hibernate-search.

Maybe we could provide an configuration option to add a suffix to every index, But I do not know if we really need it, it would be just a cherry on the cake.

Regards.
Leandro K. de Freitas

Yoann Rodière
August 30, 2017, 7:26 AM

Moving to 6.x, waiting for Sanne's reaction. It won't be implemented in 5.8 in any case, since we're in the CR phase and it's not a good time for new features.

Sanne Grinovero
August 30, 2017, 3:07 PM

Hi all,

it's an excellent suggestion but I agree that it's not a simple problem and that it requires some help from the external deployment process, so we'll need to learn more about this before committing to a strategy.

Let's keep the issue open, optimistically scheduled for 6.x as suggested. We'll have to revisit this, more ideas and feedback welcome.

In version 6 we'll try to better differentiate between an "index name" as a logical name for Hibernate Search usage and the index names being used within Elasticsearch. Adding a configuration property to control the prefix (as mentioned above) for example could be useful, but I believe it would be too confusing without a clear separation of the various types of "index name"s we currently have.

For people not relying on events for indexing (e.g. you want to rebuild the index every night and disable the event listeners) the concerns about losing some events might not apply: if others are interested in such a feature and could explain their re-indexing strategy we can revisit this if there's any need; I believe such users can simply invoke some ES management code after their indexing node is done with the MassIndexer job so unless I'm wrong there's no need for changes from our part?

Thanks!

Yoann Rodière
March 8, 2019, 9:02 AM

An update on this (complex) problem.

I moved the part about implementing near-zero downtime reindexing using aliases (what you suggested initially) to HSEARCH-3499: it will not address your use case (application updates) entirely, but might still be useful for other use cases, such as periodic reindexing.

We discussed a bit some solutions regarding the overall problem of updating an application to a new mapping with zero downtime. I will dump my notes here, just for future reference. It's very raw but it will help remind me what we discussed, at least; I might write up a more understandable wall of text on this later.

Need a way for messages sent from slaves to master to be detected as “obsolete” while we are hot-updating the schema.

Also, need a way to actually create a new index and populate it in such case.

Some solutions where the user affects a global “version number” to his application (and optionally we compute hashes for mappings/indexes) could help doing both (detecting obsolete indexing requests and obsolete schemas). See below.

https://imgur.com/xLshBIV

https://imgur.com/MMLj7Ye

Assignee

Unassigned

Reporter

Leandro Kersting de Freitas

Labels

None

Suitable for new contributors

None

Pull Request

None

Feedback Requested

None

Components

Fix versions

Affects versions

Priority

Major
Configure