Fixed
Details
Assignee
Yoann RodièreYoann RodièreReporter
Yoann RodièreYoann RodièreComponents
Sprint
NoneFix versions
Affects versions
Priority
Major
Details
Details
Assignee
Yoann Rodière
Yoann RodièreReporter
Yoann Rodière
Yoann RodièreComponents
Sprint
None
Fix versions
Affects versions
Priority
Created April 30, 2020 at 7:36 AM
Updated November 3, 2020 at 10:19 AM
Resolved October 13, 2020 at 7:41 AM
With the Lucene backend, we don't have any idea of what dynamic fields have been added to the index before the last restart of the application; we just know of dynamic fields that have been mentioned by the user (during indexing/search) since the last restart.
When we build an exists predicate for an object field, what we do internally is building a boolean query with should clauses, where each clauses tests if a "leaf" field exists. When there are dynamic fields, we don't know the full list of leaf fields, and thus we cannot properly build the exists predicate: the dynamic fields are ignored.
Solution 1: persisted metamodel
The most obvious solution would be to persist a list of indexed dynamic fields somewhere, and read that list on bootstrap. In short, introduce a persisted metamodel for the Lucene backend.
I'm not a fan of this approach because of the added complexity for just one single feature.
Solution 2: relaxed exists() matching rules
A perhaps easier solution would be to relax the exists() matching rules, and declare that exists() matches an object field if it was non-null when indexing. Basically:
For nested object fields we would just run a MatchAllDocs() query within the join: if there is a nested document, the field exists.
For flattened object fields we would have to store the list of object fields added to a given document in a specific field, and query that field. I suppose there would be an overhead at indexing time, but we already do that for other field types; see the uses of
org.hibernate.search.backend.lucene.lowlevel.common.impl.MetadataFields#fieldNamesFieldName()
.As an added benefit, this would immediately solve https://hibernate.atlassian.net/browse/HSEARCH-3904#icft=HSEARCH-3904 (take into account dynamic fields in exists() predicate on object fields) for the Lucene backend.
The main drawback is that the behavior would be different from that of Elasticsearch, which only matches object fields when they have at least one non-null non-object child. But in a way, isn't that just a limitation of Elasticsearch?
Solution 3: populate
fieldNames
when a dynamic field value is addedA lighter take on solution 2: whenever a value is added to a dynamic field, just add the name of the containing object field(s) to
org.hibernate.search.backend.lucene.lowlevel.common.impl.MetadataFields#fieldNamesFieldName()
.In the "exists" predicate, just look for the name of that field, on top of looking for values of static fields.
This solution preserves existing semantics and does not affect users that do not use dynamic fields (at all).