Add support for Elasticsearch 5

Description

Adding a ticket, since I'm working on it. I will update the required changes below as I find new ones.

Potential blockers:

  • Support for specifying analyzers in elasticsearch.yml has been removed: https://www.elastic.co/guide/en/elasticsearch/reference/5.x/analysis-custom-analyzer.html: we have to use the Rest API to declare analyzers (see HSEARCH-2219)

  • Analyzer definitions are now index-scoped so you can't declare global analyzers and have to declare the analyzers for each index (more or less each Hibernate root entity); this is highly inconvenient. This makes solving all the more important: expecting users to declare analyzers themselves on the Elasticsearch server is now a no-no (see comments on this ticket).

External work required:

Changes that would require to drop support for 2.0 (or to introduce dialects):

  • The string datatype disappeared and has been replaced by text and keyword. What we need is probably text, except for non-anlyzed fields that must be keyword s (as text fields have to be analyzed).

  • null_value is no longer supported on the text datatype: we currently use it for the indexNullAs feature

  • sorting on text fields now requires enabling data loading in the mapping

  • DeleteByQuery is a core feature again, with its own API. The plugin has been removed.

  • The default scripting language is now Painless, which is very similar to Groovy (only script parameters must be prefixed with params.)

  • For projections, the "fields" keyword when querying is now "stored_fields" and using "_source" in there is disallowed. Source filtering must be used to access the _source. e.g. ?_source_include=foo

  • arcDistanceInKm has been renamed to arcDistance and now returns meters: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_scripting.html#_geopoint_scripts

Changes that will probably also work with ES 2.x (see HSEARCH-2437):

  • "filtered" queries are no longer supported and must be replaced by "bool" queries with a "must" and a "filter"

  • the "queryString" keyword for query string queries does not work anymore, we must use "query_string" (I wonder why we didn't in the first place)

  • The syntax we used with ES 2 for search scripts ({{"script_fields:"{"_distance":{"params": {...}, "script": "..."}} }}) seems off with the documentation and doesn't work in ES 5.

  • the size parameter in bucket aggregation queries (used for facetting) used to accept a 0 value, meaning "Integer.MAX_VALUE". It was a deprecated feature and it's not possible anymore. See https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-terms-aggregation.html#_size

  • affects Elasticsearch 5.0 too (not only 2.4.1).

See my branch where I'm poking around to see what needs to be done: https://github.com/yrodiere/hibernate-search/tree/HSEARCH-2434

Activity

Show:

Yoann RodièreFebruary 9, 2017 at 4:15 PM

Created to investigate dialects.

Yoann RodièreJanuary 30, 2017 at 3:54 PM

: to be fair, 2.4 is not "outdated" yet, since there's been a bugfix release in January (https://www.elastic.co/guide/en/elasticsearch/reference/2.4/release-notes-2.4.4.html).

Also, Hibernate Search version 5.7 has been expected by users for quite some time, and thus we won't delay it any more. It's planned for release in mid-february, and we're working on fixing several issues with 5.7 before this deadline.

That being said, we are aware that the lack of support for ES 5.x is an issue and that it won't fix itself, so we are also working on this:

  • we followed the evolution of ES 5 closely, most notably the changes that occurred in ES around the keyword datatype, which was initially only partially implemented (no support for what was eventually named "normalizers").

  • we introduced features in 5.6 that pave the way for 5.x support, most notably support for propagating analyzer definitions from Hibernate Search to Elasticsearch.

  • we have work in progress to identify the actual required changes in Hibernate Search to support 5.x, which will be the first step for implementing dialects.

But there are still issues, and for now I cannot give you a release date, or even a version number. It will mainly depend on the complexity of the changes. We will keep you updated on this ticket.

RemoJanuary 29, 2017 at 10:00 AM

is there any update or roadmap on this? ElasticSearch 5.0 final is out for a bit over three months. The focus on an oudated release will probably be a readblock for many people. Having dialects could bea good thing given the lack of stability in ElasticSearch.

Yoann RodièreDecember 5, 2016 at 4:23 PM

Moving to 5.8 as an optimistic target, since we definitely won't have the time to do that in 5.6.

Yoann RodièreNovember 4, 2016 at 5:19 PM

Why only for new schema? The schema merging purpose is upgrading an existing schema, and it's not unthinkable that this schema may involve new analyzers. From experience, I'd say it's not frequent, but it happens. And it's very nice to not be forced to drop the indexes every time we add a new schema with a new analyzer (with some applications it might take quite some time to reindex).

Anyway... I'm talking about development use. If using the "schema merging" with analyzer upgrades is not a valid use case in production mode, we could simply advertise this as unsupported.

The use cases we have are:

  • The Infinispan dynamic schema modifications. If I understand correctly, it never changes the analyzers after the initial schema creation.

  • The "developper mode": merge the schema modifications and analyzer definitions when restarting an HS application. Since it's development mode, we can reasonably say "just use one application".

  • Restart of a cluster of HS application in production. I'd say a warning about using the MERGE strategy in that case would be enough, and IIRC there already is one in the documentation.

  • Full-scale automated testing of multi-node HS applications, maybe? Probably the same solution

  • Others?

I agree some stress tests may be nice, though I don't have experience setting up such tests. And I'm not sure how hard it would be to generate the race conditions. But I can try to look into it someday.

Fixed

Details

Assignee

Reporter

Components

Fix versions

Priority

Created October 26, 2016 at 4:12 PM
Updated April 13, 2017 at 5:07 PM
Resolved March 24, 2017 at 12:11 PM