Add warnings in the documentation about concurrent updates to the same entity in a clustered application

Description

Even with Elasticsearch, such updates may lead to an out-of-sync index.

For example, let's consider two instances I1 and I2 of the same application working on entities A, B and C.
A index-embeds (@IndexedEmbedded) both B and C.

Let's assume A is reindexed from both I1 and I2, in this order:

  • I1 starts a transaction

  • I1 loads A

  • I1 loads B

  • In I1, B is modified

  • I1 loads C

  • I2 loads A

  • I2 loads C

  • In I2, C is modified

  • I2 loads B

  • I1 commits the transaction, triggering the indexing of a version of A with an update to some fields of B. Since C was loaded before the changes, it will keep old values for its embedded index fields.

  • I2 commits the transaction, triggering the indexing of a version of A with an update to some fields of B. Since B was loaded before the changes, it will keep old values for its embedded index fields.

Regardless of which version of A "wins" by being indexed last, something will be out of sync:

  • If I1's version of A is indexed last, then the fields extracted from C will be out of sync.

  • If I2's version of A is indexed last, then the fields extracted from B will be out of sync.

Elasticsearch's "optimistic concurrency control would not be enough in our case, so we need actual support for clustering (see ).

Of course, applications can still be clustered at this point, but they need to be aware of this pitfall, and if necessary address it, for example by reindexing every night.

Environment

None
Fixed

Assignee

Yoann Rodière

Reporter

Yoann Rodière

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Components

Fix versions

Priority

Major
Configure