Search 6 groundwork - Restore support for scrolling

Description

Goal

Restore the scroll feature exposed in Search 5 through org.hibernate.search.query.hibernate.impl.FullTextQueryImpl#scroll().

API

All located in the org.hibernate.search.engine.search.query package.

To-do list

In order:

  1. Add APIs, with stub implementations (throw UnsupportedOperationException( "Not yet implemented" );

    1. Ignore getTotalHitCount/getAggregation/getTook/isTimeout for now.

  2. Copy-paste org.hibernate.search.integrationtest.backend.tck.search.query.SearchQueryFetchIT to SearchQueryScrollIT and adapt it to test scrolling.

    1. Don't forget to test edge cases: not fetching any result (should work fine), fetching some results but not all of them (should work fine), trying to fetch more than the total hit count (should throw an exception).

    2. Don't forget to check that hasMoreHits() returns the correct information.

  3. Add tests for timeouts (failAfter/truncateAfter) when scrolling.

  4. Implement scrolling for the stub backend.

  5. Add tests to the ORM mapper. Will probably need to copy/paste org.hibernate.search.integrationtest.mapper.orm.search.loading.SearchQueryEntityLoadingBaseIT and adapt it to test loading when calling scroll() instead of just loading when calling fetch().

  6. Implement scrolling for Elasticsearch.

    1. This should be easy enough: the first call to fetch*() will execute a search work with the scroll parameter set, the next calls with execute a scroll work (already implemented, see org.hibernate.search.elasticsearch.work.impl.factory.ElasticsearchWorkFactory#scroll).

    2. On close, we will execute a clearScroll work (already implemented, see org.hibernate.search.elasticsearch.work.impl.factory.ElasticsearchWorkFactory#clearScroll).

  7. Implement scrolling for Lucene.

    1. Search 5 code will not be very useful in that regard, as it addresses a lot of problems that are no longer relevant in Search 6.

    2. In the SearchScroll implementation we will need to keep around some of the context that we currently store as local variables in LuceneSearcherImpl#search: the IndexSearcher and the LuceneCollectors instance in particular.

    3. When calling next():

      1. First we will need to update the topDocs if necessary: if the topDocs do not include the next page, then update the topDocs

        1. See org.hibernate.search.query.engine.impl.QueryHits#scoreDoc for how to decide how many topDocs to retrieve

        2. See phase 1 in org.hibernate.search.backend.lucene.search.extraction.impl.LuceneCollectors#collect, but only phase 1

      2. Then we will need to collect information for the next page; see the call to extractTopDocs and phase 2 in org.hibernate.search.backend.lucene.search.extraction.impl.LuceneCollectors#collect.

    4. This may prove difficult, maybe let's organize a pair-programming session for that?

  8. Add Lucene-specific extensions to Scrolling

    1. This is mainly necessary for Infinispan

    2. Expose a way to force Lucene to extract TopDocs up to a specific index and retrieve them: LuceneSearchScroll#preloadTopDocsUpTo(), returns TopDocs

    3. Expose a way to load a specific document specified by its index: LuceneSearchScroll#loadHitByIndex(), returns H

    4. Maybe we can improve on that later; ideally Infinispan should load multiple hits in one call (LuceneSearchScroll#loadHitsByIndex(int ...), returns List<H>) otherwise the cost of creating collectors for each retrieved hit will be a bit too much.

  9. Implement scroll() and scroll(ScrollMode) in HibernateOrmSearchQueryAdapter, relying on SearchQuery#scroll(int) under the scene.

    1. Only ScrollMode.FORWARD_ONLY will be supported.

    2. We will need to decide on a page size. Let's use the same size as the loading fetch size, which should be accessible from org.hibernate.search.mapper.orm.search.loading.impl.MutableEntityLoadingOptions#getFetchSize.

    3. Some internal windowing will probably be necessary. Just copy/paste the org.hibernate.search.elasticsearch.util.impl.Window class from Search 5 and adapt it. Do not forget to also copy the unit test, org.hibernate.search.elasticsearch.test.WindowTest.

    4. See org.hibernate.search.query.hibernate.impl.ScrollableResultsImpl for an example of how it was done in Search 5 (may or may not be helpful).

  10. Add tests for scroll() and scroll(ScrollMode) in org.hibernate.search.integrationtest.mapper.orm.hibernateormapis.ToHibernateOrmIT:

    1. Nominal case (create scroll, fetch some hits until all hits have been consumed, close).

    2. Edge cases: not fetching any result (should work fine), fetching some results but not all of them (should work fine), trying to fetch more than the total hit count (should throw an exception).

    3. Error cases: trying to scroll back, trying to call the get*(int) methods...

    4. Check that using any scroll mode other than ScrollMode.FORWARD_ONLY fails.

    5. Test query.stream() too (it's based on scroll()).

  11. Add tests for getResultStream() in org.hibernate.search.integrationtest.mapper.orm.hibernateormapis.ToJpaIT.

  12. Allow backends to extend the SearchScroll interfaces, like they currently do with SearchQuery (ElasticsearchSearchQuery, LuceneSearchQuery):

    1. Add a generic parameter S extends SearchScroll<H> to ExtendedSearchFetchable and override its scroll methods to return that type.

    2. Adapt the interfaces that extend ExtendedSearchFetchable as necessary.

    3. Create a new ExtendedSearchScroll<H> interface using the same principle.

    4. Create specific interfaces for Elasticsearch and Lucene: ElasticsearchSearchScroll and LuceneSearchScroll.

    5. Implement these interfaces where appropriate.

    6. Test extensions for Lucene and Elasticsearch. Mainly, check that the scroll has the correct type. See how it's done for SearchResult in org.hibernate.search.integrationtest.backend.elasticsearch.ElasticsearchExtensionIT#query.

  13. Add getTotalHitCount/getAggregation to APIs if relevant and implement them.

  14. Add getTook/isTimeout to APIs if relevant and implement them.

Environment

None

Activity

Show:
Yoann Rodière
January 27, 2020, 12:35 PM

Added more details to the description.

Assignee

Fabio Massimo Ercoli

Reporter

Fabio Massimo Ercoli

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Fix versions

Priority

Major
Configure