Add support for Elasticsearch 5
Description
depends on
follows up on
Activity
Yoann RodièreFebruary 9, 2017 at 4:15 PM
Created to investigate dialects.
Yoann RodièreJanuary 30, 2017 at 3:54 PM
: to be fair, 2.4 is not "outdated" yet, since there's been a bugfix release in January (https://www.elastic.co/guide/en/elasticsearch/reference/2.4/release-notes-2.4.4.html).
Also, Hibernate Search version 5.7 has been expected by users for quite some time, and thus we won't delay it any more. It's planned for release in mid-february, and we're working on fixing several issues with 5.7 before this deadline.
That being said, we are aware that the lack of support for ES 5.x is an issue and that it won't fix itself, so we are also working on this:
we followed the evolution of ES 5 closely, most notably the changes that occurred in ES around the keyword datatype, which was initially only partially implemented (no support for what was eventually named "normalizers").
we introduced features in 5.6 that pave the way for 5.x support, most notably support for propagating analyzer definitions from Hibernate Search to Elasticsearch.
we have work in progress to identify the actual required changes in Hibernate Search to support 5.x, which will be the first step for implementing dialects.
But there are still issues, and for now I cannot give you a release date, or even a version number. It will mainly depend on the complexity of the changes. We will keep you updated on this ticket.
RemoJanuary 29, 2017 at 10:00 AM
is there any update or roadmap on this? ElasticSearch 5.0 final is out for a bit over three months. The focus on an oudated release will probably be a readblock for many people. Having dialects could bea good thing given the lack of stability in ElasticSearch.
Yoann RodièreDecember 5, 2016 at 4:23 PM
Moving to 5.8 as an optimistic target, since we definitely won't have the time to do that in 5.6.
Yoann RodièreNovember 4, 2016 at 5:19 PM
Why only for new schema? The schema merging purpose is upgrading an existing schema, and it's not unthinkable that this schema may involve new analyzers. From experience, I'd say it's not frequent, but it happens. And it's very nice to not be forced to drop the indexes every time we add a new schema with a new analyzer (with some applications it might take quite some time to reindex).
Anyway... I'm talking about development use. If using the "schema merging" with analyzer upgrades is not a valid use case in production mode, we could simply advertise this as unsupported.
The use cases we have are:
The Infinispan dynamic schema modifications. If I understand correctly, it never changes the analyzers after the initial schema creation.
The "developper mode": merge the schema modifications and analyzer definitions when restarting an HS application. Since it's development mode, we can reasonably say "just use one application".
Restart of a cluster of HS application in production. I'd say a warning about using the MERGE strategy in that case would be enough, and IIRC there already is one in the documentation.
Full-scale automated testing of multi-node HS applications, maybe? Probably the same solution
Others?
I agree some stress tests may be nice, though I don't have experience setting up such tests. And I'm not sure how hard it would be to generate the race conditions. But I can try to look into it someday.
Adding a ticket, since I'm working on it. I will update the required changes below as I find new ones.
Potential blockers:
Support for specifying analyzers in elasticsearch.yml has been removed: https://www.elastic.co/guide/en/elasticsearch/reference/5.x/analysis-custom-analyzer.html: we have to use the Rest API to declare analyzers (see HSEARCH-2219)
Analyzer definitions are now index-scoped so you can't declare global analyzers and have to declare the analyzers for each index (more or less each Hibernate root entity); this is highly inconvenient. This makes solving all the more important: expecting users to declare analyzers themselves on the Elasticsearch server is now a no-no (see comments on this ticket).
External work required:
The elasticsearch-maven-plugin is not compatible with ES 5; I set up a branch with the minimal required changes and opened a PR: https://github.com/alexcojocaru/elasticsearch-maven-plugin/pull/19
Elastic only released the core Elasticsearch artifact on maven central, and some modules, but not the groovy module (https://github.com/elastic/elasticsearch/tree/master/modules/lang-groovy). The classes are not part of the core artifact, either (they could have been due to some Gradle magic). Thus, running our Elasticsearch integration tests with an embedded instance of Elasticsearch may prove impossible (at least for those which need Groovy). Oddly enough, their own language "Painless" suffers from the exact same issue: http://search.maven.org/#search%7Cga%7C1%7Cg%3Aorg.elasticsearch%20AND%20v%3A5.0.0
*UPDATE*: actually, it's on purpose. They only want to publish ZIPs, so I guess elasticsearch-maven-plugin is a dead-end, at least as it is now: https://github.com/elastic/elasticsearch/issues/18131#issuecomment-222105133
The Optimize API has been removed in favor of the newer ForceMerge API, which is almost identical (except its name). Jest only supports the Optimize API in its current version. Note that the ForceMerge API wasn't available in ES 2.0, it appeared in ES 2.1... I opened a PR to add the ForceMerge command anyway: https://github.com/searchbox-io/Jest/pull/408
See also https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_rest_api_changes.html#_literal__optimize_literal_endpoint_removed
Changes that would require to drop support for 2.0 (or to introduce dialects):
The
string
datatype disappeared and has been replaced bytext
andkeyword
. What we need is probablytext
, except for non-anlyzed fields that must bekeyword
s (astext
fields have to be analyzed).null_value
is no longer supported on thetext
datatype: we currently use it for theindexNullAs
featuresorting on text fields now requires enabling data loading in the mapping
DeleteByQuery is a core feature again, with its own API. The plugin has been removed.
The default scripting language is now Painless, which is very similar to Groovy (only script parameters must be prefixed with
params.
)For projections, the "fields" keyword when querying is now "stored_fields" and using "_source" in there is disallowed. Source filtering must be used to access the _source. e.g.
?_source_include=foo
arcDistanceInKm
has been renamed toarcDistance
and now returns meters: https://www.elastic.co/guide/en/elasticsearch/reference/5.0/breaking_50_scripting.html#_geopoint_scriptsChanges that will probably also work with ES 2.x (see HSEARCH-2437):
"filtered" queries are no longer supported and must be replaced by "bool" queries with a "must" and a "filter"
the "queryString" keyword for query string queries does not work anymore, we must use "query_string" (I wonder why we didn't in the first place)
The syntax we used with ES 2 for search scripts ({{"script_fields:"{"_distance":{"params": {...}, "script": "..."}} }}) seems off with the documentation and doesn't work in ES 5.
the
size
parameter in bucket aggregation queries (used for facetting) used to accept a 0 value, meaning "Integer.MAX_VALUE". It was a deprecated feature and it's not possible anymore. See https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-aggregations-bucket-terms-aggregation.html#_sizeaffects Elasticsearch 5.0 too (not only 2.4.1).
See my branch where I'm poking around to see what needs to be done: https://github.com/yrodiere/hibernate-search/tree/HSEARCH-2434