Allow for Elasticsearch analyzer definitions to be applied in class bridges/custom field bridges

Description

Currently I see no way to apply custom Elasticsearch-defined analyzers on extra fields defined in class bridges.

With Lucene it is possible to do it as described here. In that article 'Solution 2' solves it using a custom analyzer implementation defined in the local code. In my case the analyzer is defined in Elasticsearch and I could not find a way to access the analyzer definition from the internal analyzer registry. 'Solution 3' relies on analyzer configuration defined by @AnalyzerDef which does not solve this issue for remotely-defined analyzers.

My use case:
I have a custom Elasticsearch index with custom analyzer definitions. I have a class bridge that adds document fields that need to be analyzed with the custom analyzers.

See related ticket here

Environment

Hibernate ORM 5.1.3.Final
Hibernate Search 5.6.1.Final
Elasticsearch 2.3.2
Java EE 7
DB engine: Aurora 5.6.10a

Activity

Show:
Yoann Rodière
March 22, 2017, 9:18 AM

Hello,

There is currently no way to do this with Elasticsearch, because:

  1. Index-time analyzer selection (that you seem to use in your forum post) cannot work with remote analyzers

  2. The fieldbridge API currently doesn't allow to define specific analyzers for bridge-defined fields

We could add a way in the API to define specific analyzers for bridge-defined fields. We would need to:

  1. Add a FieldMetadataCreationContext analyzer(String analyzerName) method to FieldMetadataCreationContext

  2. Implement FieldMetadataBuilderImpl#analyzer(String analyzerName) so that it:

    • translates the string to an analyzer reference. We currently do something similar in org.hibernate.search.engine.metadata.impl.AnnotationMetadataProvider.determineAnalyzer(Field, XProperty, ConfigContext, ParseContext)

    • registers the analyzer reference in the holding type's scoped analyzer. We currently do something similar in org.hibernate.search.engine.metadata.impl.AnnotationMetadataProvider.bindFieldAnnotation(...), just after the call to determineAnalyzer.

  3. Add a private AnalyzerReference analyzerReference attribute to BridgeDefinedField, and populate it through its constructor when we call it in FieldMetadataBuilderImpl

  4. Take this analyzer reference into account in *Elasticsearch*SchemaTranslator#org.hibernate.search.elasticsearch.schema.impl.DefaultElasticsearchSchemaTranslator.addPropertyMapping(ElasticsearchMappingBuilder, ElasticsearchIndexSettingsBuilder, BridgeDefinedField): replace the "null" in the call to addIndexOptions by the analyzer reference

  5. And obviously, add test cases for the whole thing in the -engine module. They should be automatically re-executed in the -elasticsearch module.

We will work on that eventually, but feel free to give it a shot if you want!

Ivan Krumov
March 22, 2017, 9:31 AM

Ok, good to know. Can you tell me what is the impact of a mismatch between the analyzer configuration in Elasticsearch and the one in Hibernate Search? I am testing disabling the schema validation. From what I saw in the Jest client code, the analyzer configuration in HS does not affect in any way the (bulk) index request payload. I would assume it affects search requests though.

On a related note, I noticed that using a Discriminator for index-time analyzer selection not only does not work with Elasticseatch but also seems to make some of the fields included in the class bridge be excluded from the request to ES (even though the underlying Document has the fields).

Yoann Rodière
March 22, 2017, 9:47 AM
Edited

Can you tell me what is the impact of a mismatch between the analyzer configuration in Elasticsearch and the one in Hibernate Search?

If you disable schema management, there shouldn't be any impact whatsoever. Analyzer handling on the Hibernate Search side when using Elasticsearch is purely declarative, we manage names and know little of what's behind.
Disabling schema validation may hurt in the long run, though, since you won't be warned about other mismatches in your schema. But you already know this.

On a related note, I noticed that using a Discriminator for index-time analyzer selection not only does not work with Elasticseatch but also seems to make some of the fields included in the class bridge be excluded from the request to ES (even though the underlying Document has the fields).

This is weird, especially because search doesn't not involve analyzer discriminators at any point. Could you please create a JIRA ticket, ideally with a failing test case? We have test case templates available at https://github.com/hibernate/hibernate-test-case-templates/tree/master/search. Thanks!

Assignee

Yoann Rodière

Reporter

Ivan Krumov

Labels

Suitable for new contributors

None

Feedback Requested

None

Components

Fix versions

Affects versions

Priority

Major
Configure