Optimize the computation of total hit count

Description

We currently always compute the total hit count when searching, but we could avoid that in some cases, for example when the user asks only for the hits.

Elasticsearch 7 provides the track_total_hits parameter to control that, and in Lucene we could remove the TotalHitCountCollector, getting the total hit count from the org.apache.lucene.search.TopDocsCollector.

As a next step we can change the API allowing even fetch() to return a partial total hit count. In order to do that the user must have a way to say whether or not he/she wants to enable the optimization.

We should also look how to take advantage of the related optimization in Lucene allowing to stop the search when it's certain it cannot find any more hits. See this video for more info on how it works: https://archive.fosdem.org/2019/schedule/event/super_speedy_scoring_lucene/

Activity

Show:
Fixed

Details

Assignee

Reporter

Components

Sprint

Fix versions

Priority

Created March 13, 2019 at 5:10 PM
Updated September 4, 2020 at 1:07 PM
Resolved September 1, 2020 at 12:37 PM