Fixed
Details
Assignee
Yoann RodièreYoann RodièreReporter
Yoann RodièreYoann RodièreComponents
Sprint
NoneFix versions
Priority
Major
Details
Details
Assignee
Yoann Rodière
Yoann RodièreReporter
Yoann Rodière
Yoann RodièreComponents
Sprint
None
Fix versions
Priority
Created May 15, 2019 at 12:38 PM
Updated March 31, 2020 at 11:52 AM
Resolved March 30, 2020 at 3:56 PM
These sizes are currently hardcoded:
The maximum bulk size in the Elasticsearch backend
The maximum number of worksets per batch in the serial orchestrator of the Elasticsearch backend
The maximum number of worksets per batch in the parallel orchestrator of the Elasticsearch backend
The maximum capacity of the workset queue in the serial orchestrator of the Elasticsearch backend
The maximum capacity of the workset queue in the parallel orchestrator of the Elasticsearch backend
The maximum number of worksets per batch in the write orchestrator of the Lucene indexes (a similar settings was "hibernate.search.batch_size" in Search 5, though it wasn't documented)
The maximum capacity of the workset queue in the write orchestrator of the Lucene indexes (was configured through "hibernate.search.[default|<indexname>].max_queue_length" in Search 5, see https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#lucene-indexing-performance)
We should address three problems:
The hardcoded sizes may not be very good. For example we allow 5000 worksets to queue up for execution in the parallel orchestrator of the Elasticsearch backend. Each workset might contain several works. A tad too much, maybe?
Even if we make the default values better, they'll never fit every use case. Users should be able to change them through configuration properties.
We should document how to pick a sensible size for queues. => NO, let's wait until we have some experience. For now let's simply recommend that users test the performance of their application rather than picking arbitrary values.
For workset queues, keep in mind the queue should be at least equal to
(estimated number of user threads)*(estimated number of worksets created by each transaction)
.For workset queues, we might want to allow to set the capacity to "unlimited" for people who'd rather get an OOM error than block because the queue is full. For infinite capacity, a linked list as implemented in
org.hibernate.search.backend.impl.lucene.MultiWriteDrainableLinkedList
might help.Note there was a configuration option for the maximum work queue length of the async executor in Search 5: see
org.hibernate.search.indexes.impl.PropertiesParseHelper#extractMaxQueueSize
. The sync executor, however, had a queue of unlimited capacity (a linked list).