Details
Assignee
UnassignedUnassignedReporter
Yoann RodièreYoann RodièreComponents
Priority
Major
Details
Details
Assignee
Unassigned
UnassignedReporter
Yoann Rodière
Yoann RodièreComponents
Priority
Created July 1, 2019 at 9:44 AM
Updated September 25, 2023 at 3:25 PM
Use case: I need a search query where I apply predicates that can only implemented in the index (e.g. q full-text query) but where I also apply predicates that can only be implemented in the database (e.g. relying on advanced joins or aggregations).
The usual solution would be to perform the query in the database first, list the resulting IDs, then perform the query in the index and add a filter by ID. But that cannot work if there are millions of results for either query.
There are two solutions to that problem:
Filter the hits of each "page" of the search results using WHERE clauses in the SQL query used to load the results. This is easy, but will potentially lead to empty pages before the end of the results if all the hits of that page were filtered out, even if it wasn't the last page. Also, the total hit count (
fetchTotalHitCount()
) will be inconsistent.Fully combine the results of a search query and a database query, avoiding any gap in the "pages" of the search results. This would be ideal, but it's close to impossible to implement efficiently except in some edge cases. One interesting case allowing optimization is when we can run the query with the same sort on both sides.
We should probably not try to address #2 for now: it will be rather complex to implement and test correctly, and very complex to optimize.
#1 was implemented in Search 5 through
FullTextQuery#setCriteriaQuery()
, which allowed setting database restrictions on a given search query, but its functionality was incomplete and its use was explicitly discouraged in the documentation. It wasn't ported to Search 6.Note that addresses similar use cases, but for mass indexing.
API-wise, have a look at .
Some caveats:
(same as ) Should the configuration apply exclusively to the referenced type, or to that type and every subtype?
Be careful of interactions with the cache lookup strategy introduced in HSEARCH-3349. If we implement database filtering in the loaders, the cache lookups must not be performed, because they could end up bypassing the WHERE clauses.