exists() predicate ignores dynamic fields among children of the targeted object field with the Lucene backend

Description

With the Lucene backend, we don't have any idea of what dynamic fields have been added to the index before the last restart of the application; we just know of dynamic fields that have been mentioned by the user (during indexing/search) since the last restart.

When we build an exists predicate for an object field, what we do internally is building a boolean query with should clauses, where each clauses tests if a "leaf" field exists. When there are dynamic fields, we don't know the full list of leaf fields, and thus we cannot properly build the exists predicate: the dynamic fields are ignored.

Solution 1: persisted metamodel

The most obvious solution would be to persist a list of indexed dynamic fields somewhere, and read that list on bootstrap. In short, introduce a persisted metamodel for the Lucene backend.

I'm not a fan of this approach because of the added complexity for just one single feature.

Solution 2: relaxed exists() matching rules

A perhaps easier solution would be to relax the exists() matching rules, and declare that exists() matches an object field if it was non-null when indexing. Basically:

  • For nested object fields we would just run a MatchAllDocs() query within the join: if there is a nested document, the field exists.

  • For flattened object fields we would have to store the list of object fields added to a given document in a specific field, and query that field. I suppose there would be an overhead at indexing time, but we already do that for other field types; see the uses of org.hibernate.search.backend.lucene.lowlevel.common.impl.MetadataFields#fieldNamesFieldName().

As an added benefit, this would immediately solve https://hibernate.atlassian.net/browse/HSEARCH-3904#icft=HSEARCH-3904 (take into account dynamic fields in exists() predicate on object fields) for the Lucene backend.

The main drawback is that the behavior would be different from that of Elasticsearch, which only matches object fields when they have at least one non-null non-object child. But in a way, isn't that just a limitation of Elasticsearch?

Solution 3: populate fieldNames when a dynamic field value is added

A lighter take on solution 2: whenever a value is added to a dynamic field, just add the name of the containing object field(s) to org.hibernate.search.backend.lucene.lowlevel.common.impl.MetadataFields#fieldNamesFieldName().
In the "exists" predicate, just look for the name of that field, on top of looking for values of static fields.

This solution preserves existing semantics and does not affect users that do not use dynamic fields (at all).

Activity

Show:
Fixed

Details

Assignee

Reporter

Components

Sprint

Fix versions

Affects versions

Priority

Created April 30, 2020 at 7:36 AM
Updated November 3, 2020 at 10:19 AM
Resolved October 13, 2020 at 7:41 AM