Automatically filter search results based on provided routing keys

Description

Currently, when a routing key is specified in a search query, we take care of targeting only the shards that can actually contain documents with the given routing keys.

However, since a shard may contain documents with different routing keys, it is possible that some matching documents found in these shards actually used a different routing key.

The only reason we don't currently apply a filter automatically is performance: users defining routing keys are likely to already filter their results based on an indexed field with the same value as the routing key.

However, I don't think it would be very expensive to also create an indexed meta-field holding the routing key, and to automatically add a filter on that field for all search queries that define routing keys explicitly.
The field already exists in ES: _routing), and it's indexed. For Lucene, we would need to add it.

Out of the top of my head, here are the changes we would need. They're actually quite reasonable:

  • For the Lucene backend, we'd need to index the routing key: currently it's just used for routing, not indexed.

  • For the Lucene and Elasticsearch backends, we'd need to automatically add a filter to the query when routing keys are specified.

  • For the Lucene and Elasticsearch backends, we'd need to offer a way to retrieve the routing key of a particular search hit... maybe? Not sure you need this. => No

  • For the Lucene backend, we may want to introduce a new (default) sharding strategy where routing keys are enabled but only used as discriminators, not for actual sharding. => Not necessary, the default sharding strategy works just fine for that.

Environment

None

Assignee

Yoann Rodière

Reporter

Yoann Rodière

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Fix versions

Priority

Major
Configure