We're updating the issue view to help you get more done. 

Combine the results of a database query and a search query

Description

Use case: I need a search query where I apply predicates that can only implemented in the index (e.g. q full-text query) but where I also apply predicates that can only be implemented in the database (e.g. relying on advanced joins or aggregations).

The usual solution would be to perform the query in the database first, list the resulting IDs, then perform the query in the index and add a filter by ID. But that cannot work if there are millions of results for either query.

There are two solutions to that problem:

  1. Filter the hits of each "page" of the search results using WHERE clauses in the SQL query used to load the results. This is easy, but will potentially lead to empty pages before the end of the results if all the hits of that page were filtered out, even if it wasn't the last page. Also, the total hit count (fetchTotalHitCount()) will be inconsistent.

  2. Fully combine the results of a search query and a database query, avoiding any gap in the "pages" of the search results. This would be ideal, but it's close to impossible to implement efficiently except in some edge cases. One interesting case allowing optimization is when we can run the query with the same sort on both sides.

We should probably not try to address #2 for now: it will be rather complex to implement and test correctly, and very complex to optimize.

#1 was implemented in Search 5 through FullTextQuery#setCriteriaQuery(), which allowed setting database restrictions on a given search query, but its functionality was incomplete and [its use was explicitly discouraged in the documentation|
https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#_fetching_strategy]. It wasn't ported to Search 6.

Note that addresses similar use cases, but for mass indexing.

API-wise, have a look at HSEARCH-3628.

Some caveats:

  • (same as ) Should the configuration apply exclusively to the referenced type, or to that type and every subtype?

  • Be careful of interactions with the cache lookup strategy introduced in HSEARCH-3349. If we implement database filtering in the loaders, the cache lookups must not be performed, because they could end up bypassing the WHERE clauses.

Environment

None

Status

Assignee

Unassigned

Reporter

Yoann Rodière

Labels

None

Suitable for new contributors

None

Pull Request

None

Feedback Requested

None

Components

Fix versions

Priority

Major