Add configuration options for the size and number of indexing queues, and the max size of Elasticsearch bulks

Description

These sizes are currently hardcoded:

  1. The maximum bulk size in the Elasticsearch backend

  2. The maximum number of worksets per batch in the serial orchestrator of the Elasticsearch backend

  3. The maximum number of worksets per batch in the parallel orchestrator of the Elasticsearch backend

  4. The maximum capacity of the workset queue in the serial orchestrator of the Elasticsearch backend

  5. The maximum capacity of the workset queue in the parallel orchestrator of the Elasticsearch backend

  6. The maximum number of worksets per batch in the write orchestrator of the Lucene indexes (a similar settings was "hibernate.search.batch_size" in Search 5, though it wasn't documented)

  7. The maximum capacity of the workset queue in the write orchestrator of the Lucene indexes (was configured through "hibernate.search.[default|<indexname>].max_queue_length" in Search 5, see https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#lucene-indexing-performance)

We should address three problems:

  1. The hardcoded sizes may not be very good. For example we allow 5000 worksets to queue up for execution in the parallel orchestrator of the Elasticsearch backend. Each workset might contain several works. A tad too much, maybe?

  2. Even if we make the default values better, they'll never fit every use case. Users should be able to change them through configuration properties.

  3. We should document how to pick a sensible size for queues. => NO, let's wait until we have some experience. For now let's simply recommend that users test the performance of their application rather than picking arbitrary values.

For workset queues, keep in mind the queue should be at least equal to (estimated number of user threads)*(estimated number of worksets created by each transaction).

For workset queues, we might want to allow to set the capacity to "unlimited" for people who'd rather get an OOM error than block because the queue is full. For infinite capacity, a linked list as implemented in org.hibernate.search.backend.impl.lucene.MultiWriteDrainableLinkedList might help.

Note there was a configuration option for the maximum work queue length of the async executor in Search 5: see org.hibernate.search.indexes.impl.PropertiesParseHelper#extractMaxQueueSize. The sync executor, however, had a queue of unlimited capacity (a linked list).

Activity

Show:
Fixed

Details

Assignee

Reporter

Sprint

Fix versions

Priority

Created May 15, 2019 at 12:38 PM
Updated March 31, 2020 at 11:52 AM
Resolved March 30, 2020 at 3:56 PM

Flag notifications