Search 6 groundwork - Restore support for bypassing bridges in the predicate/sort DSL

Description

Follow-up on HSEARCH-3221.

Essentially allow to do this in the predicate DSL:

.predicate().match().onRawField( "myField" ).orRawField( "myOtherField", "yetAnotherField" ).matching(<value>)

and this in the sort DSL:

.sort().byRawField( "fieldName" ).asc().onMissingValue().use( <value> )

... so that <value> is interpreted as the un-converted (raw) value of the field, and is not passed through the user-defined converter, but through a "pass-through" converter.

This should allow us to introduce a consistent syntax for the projection DSL, to project on a field without converting it:

.projection().rawField( "fieldName", String.class )

On the backend side, this means we will have to inform the backend, when fetching a builder, that:

  1. Compatibility checks should ignore any custom converter

  2. Methods such as org.hibernate.search.engine.search.predicate.spi.MatchPredicateBuilder#value, org.hibernate.search.engine.search.predicate.spi.RangePredicateBuilder#lowerLimit or org.hibernate.search.engine.search.dsl.sort.FieldSortMissingValueContext#use should ignore any custom converter and use a PassThroughToDocumentFieldValueConverter instead.

How to do that remains to be seen...

Activity

Show:

Yoann Rodière January 9, 2019 at 9:16 AM

Conversation we just had on this topic:

Yoann Rodière @yrodiere 09:20
Right, that's the only remaining major ticket... I was planning on working on it but I won't have enough time. Let me see...

ok, so the implementation should be relatively straightforward. You see how we pass a converter to the predicate builders in org.hibernate.search.backend.elasticsearch.types.predicate.impl.ElasticsearchStandardFieldPredicateBuilderFactory#createMatchPredicateBuilder and similar method?
Essentially you'll have to pass a pass-through converter instead (PassThroughToDocumentFieldValueConverter)

Fabio Massimo @fax4ever 09:23
ok

Yoann Rodière @yrodiere 09:24
this converter has to be aware of the field type, though, so it's probably better if it is built at bootstrap when we know of such things, and passed to the constructor of ElasticsearchStandardFieldPredicateBuilderFactory

Fabio Massimo @fax4ever 09:24
so I'll take them

Yoann Rodière @yrodiere 09:24
the main problem here will be to decide on the API
yes please take them

Fabio Massimo @fax4ever 09:26
in v5 what were the API to bypass a bridge?

Yoann Rodière @yrodiere 09:26
it was rather unintuitive in Search 5 actually

Fabio Massimo @fax4ever 09:27
ok

Yoann Rodière @yrodiere 09:27
something like .keyword().onField("myField").ignoreFieldBridge().matching( <value> )
not sure we can do better, but let's try...

Fabio Massimo @fax4ever 09:28
ok. Maybe I will be able to figure out an alternative after I study more the case 🙂
at the moment the API you shown seems fine to me 😛

Yoann Rodière @yrodiere 09:29
the ticket description seems obsolete. What I had in mind lately was either:

.match().onField( "myField" ).orField( "myOtherField", "yetAnotherField" ).matchingRaw(<value>)
(which would require to hold both converters in the predicate builder, and expose two methods in the predicate builder: value() and valueRaw())

or:

.match().onRawField( "myField" ).orRawField( "myOtherField", "yetAnotherField" ).matching(<value>)
(which would require to pick a single converter in the predicate builder factory and pass it to the to the predicate builder constructor)

rah, let me edit that

Fabio Massimo @fax4ever 09:29
thanks

Yoann Rodière @yrodiere 09:32
the main advantage of syntax #2 is that ultimately we could easily allow users to target a field that has a different field bridges in the different targeted indexes, but has the same type of underlying field
like, a "code" field which is a string in one entity, and a long in another, but is indexed as a string in both cases
it would be more complex to do that with the first syntax, because we perform compatibility checks when onField() is called, so with the first syntax we wouldn't know that the bridge is being bypassed at that point

Fabio Massimo @fax4ever 09:36
ok

Yoann Rodière @yrodiere 09:36
we would have to delay some operations, which on top of being messy, would lead to some exceptions being thrown later and being less clear to the user (since they would be thrown by a method call seemingly unrelated to the one that triggered a problem)
anyway...
the .ignoreFieldBridge() syntax is not an option, because it the concept of "field bridge" does not exist anymore, and bridges in general only mean something in the ORM mapper (other mappers may name them differently)
I'm inclined to use the term "raw" because it expresses what the user would want: ignore the various conversion layer, just work on the low-level values
another thing to consider is the syntax we'll use for projections

Fabio Massimo @fax4ever 09:40
ok so raw in this context means bypass the default processors

Yoann Rodière @yrodiere 09:40
bypass the converters, yes
which are defined by the bridge, but that's none of the backend's business

Fabio Massimo @fax4ever 09:41
other mappers could bypass other things (not bridge)

Yoann Rodière @yrodiere 09:41
yes
in fact it's already the case for "type bridges"
the type bridge defines multiple fields, each with (potentially) its own, specific converter
in the DSL, you're not really bypassing the bridge itself, but rather the converters that the bridge defined

Fabio Massimo @fax4ever 09:43
it's a way with which the user can perform custom converting...

Yoann Rodière @yrodiere 09:43
yes

Fabio Massimo @fax4ever 09:43
maybe we can change just the word raw, not the main idea

Yoann Rodière @yrodiere 09:43
whereas a bridge has a more general responsibility of binding something from an entity to something in a document
which also implies definining metadata
well I'm all ears 🙂
like I said we also need to take the projection DSL into account

Fabio Massimo @fax4ever 09:44
and metadata will be preserved in this case, won't it?

Yoann Rodière @yrodiere 09:44
ideally we'd want the syntax for bypassing converters in the projection DSL to be consistent
all metadata except the converter, yes

Fabio Massimo @fax4ever 09:45
I think I got it

Yoann Rodière @yrodiere 09:45
so regarding the projection DSL

Fabio Massimo @fax4ever 09:46
so ignoreFieldBridge() was definitely wrong 🙂
not only for the flexibility
but also for the meaning itself

Yoann Rodière @yrodiere 09:46
yes... though it made a bit more sense in Search 5, since bridges worked differently then

Fabio Massimo @fax4ever 09:47
ok

Yoann Rodière @yrodiere 09:47
so the syntax in the projection DSL, currently, is something like this: .field( "string", String.class )
adding some .ignoreConverter() method after that would be problematic, because we make some compatibility checks in the field methods, to check that the (converted) field can indeed be assigned to String
with my example, we could add a rawField method: .rawField( "string", String.class )

Fabio Massimo @fax4ever 09:49
I see the point here

Yoann Rodière @yrodiere 09:49
then the compatibility check would know that we are targeting the raw field, and that we should check that the unconverted field can be assigned to String, not the converted one

Fabio Massimo @fax4ever 09:50
I see that raw fits well with this case

Yoann Rodière @yrodiere 09:50
there's a last one: the sort DSL
here, the converters are only used when specifying the value to use when it's missing in a document

Fabio Massimo @fax4ever 09:51
customField instead of rawField maybe?

Yoann Rodière @yrodiere 09:51
hmm no custom doesn't really fit
if anything, in this case the field is less customized
since we're ignoring the user-defined converter

Fabio Massimo @fax4ever 09:52
ah ok

Yoann Rodière @yrodiere 09:52
to be clear the string being passed is the field name
so .field( "fieldName", String.class ) and .rawField( "fieldName", String.class )

Fabio Massimo @fax4ever 09:52
let's keep raw from now, it makes sense
from what I see

Yoann Rodière @yrodiere 09:53
So, the last one: the sort DSL
as I said, here, the converters are only used when specifying the value to use when it's missing in a document

Fabio Massimo @fax4ever 09:53
raw field = a field wihout converter applyed on it

Yoann Rodière @yrodiere 09:54
yes

Fabio Massimo @fax4ever 09:54
or raw field = use just the filed itself, not its converters

Yoann Rodière @yrodiere 09:54
yes
So converters are used in the sort DSL when someone does this: .byField( "fieldName" ).asc().onMissingValue().use( <value> )
I suppose the only way to be consistent would be to introduce a byRawField method
then the use method would not apply the user-defined converter

Fabio Massimo @fax4ever 09:56
In order to forbid .use after a .rawField

Yoann Rodière @yrodiere 09:56
you could still use it, it would just not apply the same converter

Fabio Massimo @fax4ever 09:57
ok just ignore it

Fixed

Details

Assignee

Reporter

Components

Sprint

Fix versions

Priority

Created August 3, 2018 at 4:01 PM
Updated March 21, 2019 at 5:27 PM
Resolved March 4, 2019 at 2:38 PM