Simplify and improve ordering and parallelism of Elasticsearch indexing

Description

One simplification we could apply in particular is to only ever execute indexing works (Index/Delete) in bulks, even if there's only one. That shouldn't affect performance too much, and that would definitely make the code simpler.

When that's done, many of the improvements implemented in the Lucene backend as part of could be applied to the Elasticsearch backend as well:

  • Queue works instead of worksets (simplifies configuration, e.g. will be easier to implement)

  • Use a single thread pool for the whole backend (share resources across indexes)

  • Do not batch works that don't benefit from batching, e.g. non-bulkable works such as purge, search queries, ... In the case of Elasticsearch, that would mean submitting them to the REST client immediately when they are submitted to the orchestrator.

  • Maybe, use multiple queues per orchestrator in order to execute multiple works for the same index in parallel

  • Maybe, move to a common, global orchestrator for indexing

  • More?

Activity

Show:
Fixed

Details

Assignee

Reporter

Components

Sprint

Fix versions

Priority

Created March 26, 2020 at 8:53 AM
Updated March 31, 2020 at 11:52 AM
Resolved March 30, 2020 at 11:15 AM