Explicit checks when referencing Elasticsearch analysis definitions

Description

Currently, when we reference an analyzer/normalizer (while defining a field) or a char filters/tokenizers/token filter (while defining an analyzer/normalizer), we don't check that the name actually matches a definition. That's because we expect some definitions to be available on the server side regardless of the user configuration, and we don't know exactly which ones will be (since the user can use server-side configuration to define them).

I think we should:

1. Throw exceptions when trying to reference unknown analyzers, normalizers, char filters, tokenizers or token filters.
2. Set up a whitelist of analyzers, normalizers, char filters, tokenizers or token filters that are expected to already be defined on the server side (see https://hibernate.atlassian.net/browse/HSEARCH-2584 )
3. Allow users to add even more names to this whitelist (context.analyzer( "myName" ).builtin() or something similar in the ElasticsearchAnalysisConfigurer)

Thoughts?

Activity

Details

Assignee

Reporter

Priority

Created October 3, 2018 at 7:02 AM
Updated September 25, 2023 at 2:48 PM

Flag notifications