Optimize the retrieval of topDocs in very long Lucene scrolls

Description

Currently, every time we re-execute the query in a Lucene scroll, we retrieve the topDocs from the start, i.e. org.hibernate.search.backend.lucene.search.extraction.impl.LuceneCollectors#extractTopDocs is passed an offset of 0.

This results in the creation of unnecessarily large arrays for topdocs, where we systematically know that the first half of the array contains useless data (since we double the fetch size on each query execution).

In very long scrolls, this could have an impact on performance. The impact would probably be small, since we need to collect all topdocs during query execution anyway; it's just that we don't need to copy all of them into the returned array, just the second half.

Let's make sure that we use an offset for query executions in scrolls, as appropriate. Essentially the offset will be the index of the next document we are interested in.

Environment

None

Assignee

Fabio Massimo Ercoli

Reporter

Yoann Rodière

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Components

Fix versions

Priority

Major
Configure