Support for parallel searchers
Description
Activity
Gustavo FernandesSeptember 14, 2016 at 7:23 AM
This seems actually quite simple to implement |
My thoughts initially, except that as I wrote earlier even if you construct the IndexReader
with an ExecutorService
, it is not used in all cases, so need to check the feasibility.
Or would you expect to have multiple ExecutorService instances, like a dedicated one for each sharded index? |
One per SearchIntegrator IMO
In the case of Infinispan I suspect you'd want multiple caches to share the same ExecutorService, which implies having multiple SearchIntegrator instances share one |
This could be a possible evolution
Sanne GrinoveroSeptember 13, 2016 at 9:24 PMEdited
Are the segments queried in parallel or sequentially when querying multiple indexes?
Sequential.
This seems actually quite simple to implement. I expect we'd have a single ExecutorService started during boot of the SearchIntegrator
, and pass this to each org.apache.lucene.search.IndexSearcher
constructor.
Or would you expect to have multiple ExecutorService
instances, like a dedicated one for each sharded index? I think not as ultimately you have the same CPU to limit/share.
A nice side-effect is that people would be able to limit how many CPU cores at maximum they would dedicate to query execution; this would require some configuration properties to set the initial constraints of the ExecutorService
but I guess people would likely want to be able to manage these parameters over some mbean as well.
In the case of Infinispan I suspect you'd want multiple caches to share the same ExecutorService
, which implies having multiple SearchIntegrator
instances share one, so you'd want this executor to be provided by an injectable org.hibernate.search.engine.service.spi.Service
.
Gustavo FernandesSeptember 13, 2016 at 7:51 PM
In some cases we do query multiple indexes even w/o sharding |
Are the segments queried in parallel or sequentially when querying multiple indexes?
but I'm skeptical on this being effectively useful in practice |
There's a discussion on https://issues.apache.org/jira/browse/LUCENE-5299 with some numbers.
Are you needing this to implement any specific feature in Infinispan? |
Not necessarily, I was experimenting with the AffinityIndexManager which is heavily sharded and was wondering if the parallelisation would help
to improve query performance, since it gets severely impacted with the number of shards.
Sanne GrinoveroSeptember 13, 2016 at 7:07 PM
In some cases we do query multiple indexes even w/o sharding, but I'm skeptical on this being effectively useful in practice in the case of Hibernate Search as one usually expects a server side application running several queries in parallel.
Are you needing this to implement any specific feature in Infinispan?
Lucene has support to pass in an
ExecutorService
to theIndexReader
in order to do searches in parallel across segments.Currently when sharding is used, query performance is
O(n)
where n is the number of IndexManagers involved.The caveat is that the
ExecutorService
only is used to certain methods fromIndexReader
, and also it is not suitable to all cases: it will trade latency for throughput, so ideally this should be configurable.