Add syntactic sugar to the QueryDSL for simple multiple-term keyword matches

Description

The idea is to be able to write something like this:

Or:

This would translate to something like (title:Test1 title:Test2 title:Test3).

Currently, users have to write a boolean junction by hand, which is a bit verbose.

We could also have a matchingAll method, which would create an AND instead of an OR: (+title:Test1 +title:Test2). It would be useful mainly for multi-valued fields or analyzed text fields.

This could be exposed as a new terms predicate, similar to Elasticsearch's.

  • Updside: more future-proof, as new options of the match predicate may be hard to implement if there are multiple values to match ( in particular).

  • Downside: input to a terms predicate is not analyzed, nor is it normalized, and a terms predicate always has constant score. But then, if there are lots of terms, chances are the target field is not analyzed/normalized? Users could fall back to a boolean predicate composed of match predicates if they really need analysis/normalization.

Implementation-wise, we could use:

  • a boolean predicate; this wouldn't work very well if there are lots of terms, since there is a limit on how many clauses you can add to a boolean predicate.

  • org.apache.lucene.search.TermInSetQuery for text fields. It's rewritten to a boolean query when there aren't many terms.

  • org.apache.lucene.document.IntPoint#newSetQuery(java.lang.String, java.util.Collection<java.lang.Integer>) or equivalent for numeric fields.

  • The ES equivalent: terms

Environment

None
Fixed

Assignee

Fabio Massimo Ercoli

Reporter

Yoann Rodière

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Components

Fix versions

Priority

Major