Skip non-competitive hits when running a MatchAllDocsQuery with descending score sort with Lucene

Description

Currently, when we run a MatchAllDocsQuery with Lucene (i.e. the user requested a matchAll predicate and there are no nested documents), we scan the whole index.

One of the reasons we scan the whole index is to count all hits. But there's a much more efficient method of counting hits when executing a MatchAllDocsQuery: IndexReader.numDocs(). If we were to use that method, we could disable the counting of hits during the search (by setting the total hit count threshold to 0), removing one reason to scan the whole index.

Then, when users are in this situation and request a sort by descending score, Lucene optimizations would theoretically be able to kick in to make sure that we only scan competitive hits. What's interesting is that (in theory) for a MatchAllDocsQuery, absolutely no hit is competitive after we collected the N first hits requested by the user. That's because the MatchAllDocsQuery assigns the same score to all hits.

In best-case scenarios, we could end up scanning a dozen elements instead of millions, yielding a massive performance improvement.

That's interesting in particular because one of the queries included in Infinispan's performance tests does just that: a MatchAllDocsQuery with a descending score, on an index with many documents.

The only unknown here is whether Lucene enables these optimizations with MatchAllDocsQuery, or generally with constant-score queries. Theoretically that's possible, but whether it's been implemented remains to be seen.

Environment

None
Fixed

Assignee

Yoann Rodière

Reporter

Yoann Rodière

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Components

Fix versions

Priority

Major
Configure