Allow to set the minimum number of should clauses required to match for boolean predicates in the DSL

Description

See org.apache.lucene.search.BooleanQuery.Builder#setMinimumNumberShouldMatch and https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html

We might want to allow either an absolute number of clauses (what the Lucene APIs offer) or a percentage (what both Solr and Elasticsearch offer, as an alternative).

We should probably backport this to 5.10, since the lack of such a feature must be really annoying when using the Elasticsearch integration: you basically have to write the whole boolean query and its subqueries as JSON if you want to set a minimum number of should clauses required to match. This means in particular no field bridge, and having to serialize filter parameters to JSON yourself.

Environment

None

Activity

Show:
Goran Jaric
March 22, 2019, 9:16 AM


And the second issue is why Lucene search in version <hibernate-search.version>5.5.4.Final</hibernate-search.version> for the given scenario above gave some result in the first place? it is not in compliance with https://www.elastic.co/guide/en/elasticsearch/reference/5.6/query-dsl-bool-query.html

Yoann Rodière
March 22, 2019, 1:08 PM

Hello Goran,

When do we expect that it is included into Lucene=>Elasticsearch query conversion.

It already is included, and has been for 9 months, since version 5.10.2 was released: http://in.relation.to/2018/06/22/hibernate-search-5-10-2-Final/

If you use 5.9, then this feature is not included. Beside upgrading to 5.10 or 5.11, your only solution would be to rely on ElasticsearchQueries.fromJson, but that means writing the whole query directly as JSON.

And the second issue is why Lucene search in version <hibernate-search.version>5.5.4.Final</hibernate-search.version> for the given scenario above gave some result in the first place? it is not in compliance with https://www.elastic.co/guide/en/elasticsearch/reference/5.6/query-dsl-bool-query.html

If you I do not have enough information to answer that. My best guess is that your query is not what you think it is, or your should clause does match a document, or (unlikely) there is a bug in Lucene.

If you think there is a problem in Hibernate Search, you can try to create a reproducer based on our test case template.

Goran Jaric
March 22, 2019, 3:23 PM


Here is an example:

This test is failing.

Yoann Rodière
March 25, 2019, 7:32 AM

It works as intended. As explained in the Lucene and Elasticsearch documentation, should clauses are completely optional if there is also a must clause in the same boolean junction.

If you get a different behavior in Elasticsearch, I would suggest to look into you analyzer definitions, or check that you reindexed since you last changed your mapping. It's likely that the query doesn't match for a reason completely unrelated to minimum_should_match.

By the way, you're using BooleanQuery.Builder needlessly, you can just use the query builder (qb.bool().

If you want to make a single should clause mandatory, you obviously could use must instead. But I guess this was just a simplified example.

If you have multiple should clauses and you want to make at least one of them mandatory, just separate the should clauses from the must clauses:

Or, if you want more fine-grained control, or do not want to change your queries that much, just upgrade to Hibernate Search 5.10.2 or later and use minimum_should_match, either through BooleanQuery.Builder.setMinimumNumberShouldMatch(1) or through qb.bool().minimumShouldMatchNumber(1).

Goran Jaric
March 25, 2019, 10:52 AM

Hi I opened separate issue not to clutter this one, same query for same data set and mapping returns different results if index manager is elasticsearch in compare to directory-based
https://hibernate.atlassian.net/browse/HSEARCH-3534

Assignee

Yoann Rodière

Reporter

Yoann Rodière

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Priority

Major
Configure