Search 6 groundwork - Restore support for bypassing bridges in the predicate/sort DSL

Description

Follow-up on HSEARCH-3221.

Essentially allow to do this in the predicate DSL:

.predicate().match().onRawField( "myField" ).orRawField( "myOtherField", "yetAnotherField" ).matching(<value>)

and this in the sort DSL:

.sort().byRawField( "fieldName" ).asc().onMissingValue().use( <value> )

... so that <value> is interpreted as the un-converted (raw) value of the field, and is not passed through the user-defined converter, but through a "pass-through" converter.

This should allow us to introduce a consistent syntax for the projection DSL, to project on a field without converting it:

.projection().rawField( "fieldName", String.class )

On the backend side, this means we will have to inform the backend, when fetching a builder, that:

Compatibility checks should ignore any custom converter
Methods such as org.hibernate.search.engine.search.predicate.spi.MatchPredicateBuilder#value, org.hibernate.search.engine.search.predicate.spi.RangePredicateBuilder#lowerLimit or org.hibernate.search.engine.search.dsl.sort.FieldSortMissingValueContext#use should ignore any custom converter and use a PassThroughToDocumentFieldValueConverter instead.

How to do that remains to be seen...

Linked issues

follows up on

HSEARCH-3221

Search 6 groundwork - Restore support for applying bridges automatically to predicate/sort DSL parameters

relates to

HSEARCH-3307

Take into account how the field is being used when determining cross-index field compatibility

Activity

Show:

Yoann Rodière January 9, 2019 at 9:16 AM

Conversation we just had on this topic:

Yoann Rodière @yrodiere 09:20
Right, that's the only remaining major ticket... I was planning on working on it but I won't have enough time. Let me see...
ok, so the implementation should be relatively straightforward. You see how we pass a converter to the predicate builders in org.hibernate.search.backend.elasticsearch.types.predicate.impl.ElasticsearchStandardFieldPredicateBuilderFactory#createMatchPredicateBuilder and similar method?
Essentially you'll have to pass a pass-through converter instead (PassThroughToDocumentFieldValueConverter)
Fabio Massimo @fax4ever 09:23
ok
Yoann Rodière @yrodiere 09:24
this converter has to be aware of the field type, though, so it's probably better if it is built at bootstrap when we know of such things, and passed to the constructor of ElasticsearchStandardFieldPredicateBuilderFactory
Fabio Massimo @fax4ever 09:24
so I'll take them
Yoann Rodière @yrodiere 09:24
the main problem here will be to decide on the API
yes please take them
Fabio Massimo @fax4ever 09:26
in v5 what were the API to bypass a bridge?
Yoann Rodière @yrodiere 09:26
it was rather unintuitive in Search 5 actually
Fabio Massimo @fax4ever 09:27
ok
Yoann Rodière @yrodiere 09:27
something like .keyword().onField("myField").ignoreFieldBridge().matching( <value> )
not sure we can do better, but let's try...
Fabio Massimo @fax4ever 09:28
ok. Maybe I will be able to figure out an alternative after I study more the case 🙂
at the moment the API you shown seems fine to me 😛
Yoann Rodière @yrodiere 09:29
the ticket description seems obsolete. What I had in mind lately was either:
.match().onField( "myField" ).orField( "myOtherField", "yetAnotherField" ).matchingRaw(<value>)
(which would require to hold both converters in the predicate builder, and expose two methods in the predicate builder: value() and valueRaw())
or:
.match().onRawField( "myField" ).orRawField( "myOtherField", "yetAnotherField" ).matching(<value>)
(which would require to pick a single converter in the predicate builder factory and pass it to the to the predicate builder constructor)
rah, let me edit that
Fabio Massimo @fax4ever 09:29
thanks
Yoann Rodière @yrodiere 09:32
the main advantage of syntax #2 is that ultimately we could easily allow users to target a field that has a different field bridges in the different targeted indexes, but has the same type of underlying field
like, a "code" field which is a string in one entity, and a long in another, but is indexed as a string in both cases
it would be more complex to do that with the first syntax, because we perform compatibility checks when onField() is called, so with the first syntax we wouldn't know that the bridge is being bypassed at that point
Fabio Massimo @fax4ever 09:36
ok
Yoann Rodière @yrodiere 09:36
we would have to delay some operations, which on top of being messy, would lead to some exceptions being thrown later and being less clear to the user (since they would be thrown by a method call seemingly unrelated to the one that triggered a problem)
anyway...
the .ignoreFieldBridge() syntax is not an option, because it the concept of "field bridge" does not exist anymore, and bridges in general only mean something in the ORM mapper (other mappers may name them differently)
I'm inclined to use the term "raw" because it expresses what the user would want: ignore the various conversion layer, just work on the low-level values
another thing to consider is the syntax we'll use for projections
Fabio Massimo @fax4ever 09:40
ok so raw in this context means bypass the default processors
Yoann Rodière @yrodiere 09:40
bypass the converters, yes
which are defined by the bridge, but that's none of the backend's business
Fabio Massimo @fax4ever 09:41
other mappers could bypass other things (not bridge)
Yoann Rodière @yrodiere 09:41
yes
in fact it's already the case for "type bridges"
the type bridge defines multiple fields, each with (potentially) its own, specific converter
in the DSL, you're not really bypassing the bridge itself, but rather the converters that the bridge defined
Fabio Massimo @fax4ever 09:43
it's a way with which the user can perform custom converting...
Yoann Rodière @yrodiere 09:43
yes
Fabio Massimo @fax4ever 09:43
maybe we can change just the word raw, not the main idea
Yoann Rodière @yrodiere 09:43
whereas a bridge has a more general responsibility of binding something from an entity to something in a document
which also implies definining metadata
well I'm all ears 🙂
like I said we also need to take the projection DSL into account
Fabio Massimo @fax4ever 09:44
and metadata will be preserved in this case, won't it?
Yoann Rodière @yrodiere 09:44
ideally we'd want the syntax for bypassing converters in the projection DSL to be consistent
all metadata except the converter, yes
Fabio Massimo @fax4ever 09:45
I think I got it
Yoann Rodière @yrodiere 09:45
so regarding the projection DSL
Fabio Massimo @fax4ever 09:46
so ignoreFieldBridge() was definitely wrong 🙂
not only for the flexibility
but also for the meaning itself
Yoann Rodière @yrodiere 09:46
yes... though it made a bit more sense in Search 5, since bridges worked differently then
Fabio Massimo @fax4ever 09:47
ok
Yoann Rodière @yrodiere 09:47
so the syntax in the projection DSL, currently, is something like this: .field( "string", String.class )
adding some .ignoreConverter() method after that would be problematic, because we make some compatibility checks in the field methods, to check that the (converted) field can indeed be assigned to String
with my example, we could add a rawField method: .rawField( "string", String.class )
Fabio Massimo @fax4ever 09:49
I see the point here
Yoann Rodière @yrodiere 09:49
then the compatibility check would know that we are targeting the raw field, and that we should check that the unconverted field can be assigned to String, not the converted one
Fabio Massimo @fax4ever 09:50
I see that raw fits well with this case
Yoann Rodière @yrodiere 09:50
there's a last one: the sort DSL
here, the converters are only used when specifying the value to use when it's missing in a document
Fabio Massimo @fax4ever 09:51
customField instead of rawField maybe?
Yoann Rodière @yrodiere 09:51
hmm no custom doesn't really fit
if anything, in this case the field is less customized
since we're ignoring the user-defined converter
Fabio Massimo @fax4ever 09:52
ah ok
Yoann Rodière @yrodiere 09:52
to be clear the string being passed is the field name
so .field( "fieldName", String.class ) and .rawField( "fieldName", String.class )
Fabio Massimo @fax4ever 09:52
let's keep raw from now, it makes sense
from what I see
Yoann Rodière @yrodiere 09:53
So, the last one: the sort DSL
as I said, here, the converters are only used when specifying the value to use when it's missing in a document
Fabio Massimo @fax4ever 09:53
raw field = a field wihout converter applyed on it
Yoann Rodière @yrodiere 09:54
yes
Fabio Massimo @fax4ever 09:54
or raw field = use just the filed itself, not its converters
Yoann Rodière @yrodiere 09:54
yes
So converters are used in the sort DSL when someone does this: .byField( "fieldName" ).asc().onMissingValue().use( <value> )
I suppose the only way to be consistent would be to introduce a byRawField method
then the use method would not apply the user-defined converter
Fabio Massimo @fax4ever 09:56
In order to forbid .use after a .rawField
Yoann Rodière @yrodiere 09:56
you could still use it, it would just not apply the same converter
Fabio Massimo @fax4ever 09:57
ok just ignore it