Support DISTINCT select on single field projection
Description
Activity
Yoann Rodière February 1, 2022 at 1:03 PM
I think this is now possible in recent versions of Elasticsearch, and it’s called “collapsing”:
Yoann Rodière May 25, 2020 at 4:40 PM
On a related note, if you can use Hibernate Search 6 aggregations to retrieve a list of the top N distinct terms of a given field for all matching documents. See here for an example.
That being said, you wouldn't get pagination.
Sanne Grinovero June 7, 2018 at 10:12 AM
It's not hard to do in Lucene - assuming the limitations I mentioned in the description: it has to be a projection on the terms only, and even then on a single field.
We can't possibly use this information as a filtering clause for other components, such as to select entities.
I don't know about Elasticsearch though - happy to postpone this.
Yoann Rodière June 7, 2018 at 6:36 AM
After looking into it a bit more, it doesn't seem to be possible in Elasticsearch... So I guess we will have a hard time doing it in Lucene too. See https://stackoverflow.com/questions/27776582/aggregation-sorting-pagination-in-elastic-search
Maybe the best solution would be to avoid the problem altogether, advising users querying index A that need a DISTINCT on some related entity B's ID to index that entity B instead, and to rely on nested fields for advanced predicates on entity A.
Yoann Rodière June 7, 2018 at 6:25 AM
Adding as a pre-requisite. To support DISTINCT properly together with limits and offsets (pagination), we need aggregation features in the backend.
This is an often requested feature, the information is in the index so we could expose it. I guess the hardest part of solving this issue is to propose a nice and simple API.