Support DISTINCT select on single field projection

Description

This is an often requested feature, the information is in the index so we could expose it. I guess the hardest part of solving this issue is to propose a nice and simple API.

Activity

Show:

Yoann Rodière February 1, 2022 at 1:03 PM

I think this is now possible in recent versions of Elasticsearch, and it’s called “collapsing”:

Yoann Rodière May 25, 2020 at 4:40 PM

On a related note, if you can use Hibernate Search 6 aggregations to retrieve a list of the top N distinct terms of a given field for all matching documents. See here for an example.

That being said, you wouldn't get pagination.

Sanne Grinovero June 7, 2018 at 10:12 AM

It's not hard to do in Lucene - assuming the limitations I mentioned in the description: it has to be a projection on the terms only, and even then on a single field.

We can't possibly use this information as a filtering clause for other components, such as to select entities.

I don't know about Elasticsearch though - happy to postpone this.

Yoann Rodière June 7, 2018 at 6:36 AM

After looking into it a bit more, it doesn't seem to be possible in Elasticsearch... So I guess we will have a hard time doing it in Lucene too. See https://stackoverflow.com/questions/27776582/aggregation-sorting-pagination-in-elastic-search

Maybe the best solution would be to avoid the problem altogether, advising users querying index A that need a DISTINCT on some related entity B's ID to index that entity B instead, and to rely on nested fields for advanced predicates on entity A.

Yoann Rodière June 7, 2018 at 6:25 AM

Adding as a pre-requisite. To support DISTINCT properly together with limits and offsets (pagination), we need aggregation features in the backend.

Details

Assignee

Reporter

Components

Priority

Created August 24, 2011 at 11:04 AM
Updated September 25, 2023 at 2:48 PM