Let say we have following entity
Query constructed in following way
for given data set:
returns different results (0 documents) if
and different result (1 document) if
If you use HSEARCH-3534_lucene.patch against https://github.com/hibernate/hibernate-test-case-templates/tree/master/search/hibernate-search-lucene, noneOfShouldMatchedWithinBooleanQueryInsideFilter_differentResults_directoryBasedVSElasticsearch test will pass, but
when you use HSEARCH-3534_elasticsearch.patch against https://github.com/hibernate/hibernate-test-case-templates/tree/master/search/hibernate-search-elasticsearch/hibernate-search-elasticsearch-5
the same test will fail.
Maybe using should and must in filters like siblings does not make much sense, because should (in directory-based) clause will be ignored, because document must fulfill must criterias and therefor should is ignored, and makes more sense in queries while scoring the search, however I ran into the issue while migrating huge project where such cases appear time to time, due to dynamic queries creation.
Thanks for the report and the test cases, now I see what you meant.
I pushed your patches to a fork of the repo for future reference: https://github.com/yrodiere/hibernate-test-case-templates/tree/HSEARCH-3534/
Now, the problem. If I understand correctly, the Elasticsearch team decided it would be a good idea for the boolean junctions to behave differently when they are nested under a filter/must_not clause than when they are not:
If the bool query is in a query context and has a must or filter clause then a document will match the bool query even if none of the should queries match. In this case these clauses are only used to influence the score.
If the bool query is a filter context or has neither must or filter then at least one of the should queries must match a document for it to match the bool query. This behavior may be explicitly controlled by settings the minimum_should_match parameter.
This effectively means that minimum_should_match defaults to 0 in the first case, and to 1 in the second case.
The thing is, this is completely arbitrary and not something we have in Lucene at all.
I can see three solutions:
We change the Lucene backend to implement the same behavior. That might be a bit difficult to achieve, in particular when the user doesn't rely on our DSL. But more importantly, that will be surprising to people already familiar with Lucene.
We change the Elasticsearch backend to work around these defaults and force Lucene's defaults instead. This will be surprising to people already familiar with Elasticsearch.
We don't change anything, and simply document this oddity.
1 seems dodgy, but option 2 seems more reasonable. And a few tests show that it's possible. Let's try to do it, at least in 6.
Now you exactly now the place where I am in .
Yes, you elaborated it very accurate, thanks, and thanks for the fast replies so far!
I already started to investigate possibility of 2. option you suggested in the meantime, since it make most sense to me. Hopefully most of the people familiar with Elastichsearch would want to create query above in a way that they would nest should under separate bool query which would be siblings of must in the filter, and not make it siblings of it... For e.g.
It could be that this is isolated, edge case, since it is only reproducible under example I provided above, means it would be better to deal with it not too radical (should being in same junction with must under filter clause).
Thanks for the issue. Yeah, sometimes the backends behave differently.
We're going to force the Elasticsearch backend to the Lucene's defaults. The solution #2 mentioned.
In particular, we're going to force the default minimum should match to 0 if the should has some must as a sibling and is inside a filter predicate.
Fix will be applied to the major 6.
Great! This was exactly my temporary fix implemented outside of hibernate -search-elasticsearch library.