Deal with index mappings creation/upgrade/concurrency in the Elasticsearch case

Description

We need to be able to upgrade the index mapping without ripping off all the index content.

What happens if several nodes in a cluster try to create/upgrade the mappings?

is followed up by

Activity

Show:

Yoann RodièreSeptember 30, 2016 at 9:20 AM

As discussed on IRC, the export is now out of scope and will be added as part of HSEARCH-2366.

Work status: validation feature and code cleanup are ok, I'll see to the documentation and open a PR.

Yoann RodièreSeptember 22, 2016 at 9:56 AM

I've been trying to add a command-line tool to export the mappings, similar to (but more limited than) what's provided by org.hibernate.tool.hbm2ddl.SchemaExport.
Unfortunately, it seems very complex (if not impossible) to make Hibernate Search generate its internal Metadata without having it start its index managers. Thus we need to have an Elasticsearch instance available even if we simply want to export mappings to local files and be done with it...

So it means I have two solutions:

  • either I dig into Search to separate metadata building from index management, and then build a command-line tool.

  • or I implement the mappings export feature as part of the index manager starting process. This seems really weird to me, but at least it would be easy to use for users...

Another issue is that, since the Elasticsearch module does not depend on Hibernate ORM, I cannot rely on ORM to build the Hibernate Search configuration (org.hibernate.search.cfg.impl.SearchConfigurationFromHibernateCore) and in particular to detect the indexed entities. Thus if we provide a command-line tool, one of the matters that will have to be addressed will be indexed entities auto-detection (we cannot decently ask users to list all their entities explicitly...).

I'd like to speak with you before going any further, so I'll wait till you return from PTO before doing anything. The VALIDATE strategy is already working, by the way, so this export and documentation are all that's left for this ticket.

Details: the configuration reading and metadata building seems to be bootstrapped from org.hibernate.search.spi.SearchIntegratorBuilder.buildNewSearchFactory(), and the same method also:

  • creates the index managers through org.hibernate.search.spi.SearchIntegratorBuilder.initDocumentBuilders(SearchConfiguration, BuildContext, SearchMapping)) which calls org.hibernate.search.indexes.impl.IndexManagerHolder.buildEntityIndexBinding(XClass, Class, SearchConfiguration, WorkerBuildContext) which calls org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManagers(String, Properties[], Similarity, Class<?>, WorkerBuildContext)

  • starts the index managers through a call to org.hibernate.search.engine.impl.MutableSearchFactoryState.setActiveSearchIntegrator(ExtendedSearchIntegratorWithShareableState) which calls org.hibernate.search.indexes.impl.IndexManagerHolder.setActiveSearchIntegrator(ExtendedSearchIntegratorWithShareableState) which calls org.hibernate.search.indexes.spi.IndexManager.setSearchFactory(ExtendedSearchIntegrator), which in the Elasticsearch case will initialize the index.

Yoann RodièreSeptember 16, 2016 at 12:31 PM

If I understood correctly, the idea is to provide about the same features as what hbm2ddl does in Hibernate ORM.

I've already started working on this (though I'm not done yet). If there is an issue, VALIDATE simply throws an exception at startup with some details. Later we can see if we want a complete diff report, but for now I think it should do.

Gunnar MorlingSeptember 16, 2016 at 10:52 AM

What would VALIDATE do exactly? Create some sort of report? Maybe we rather start with a schema export facility, users could then compare this themselves to the current mapping in ES.

Sanne GrinoveroSeptember 13, 2016 at 9:55 AM
Edited

Over chat asked if the VALIDATE option should deal with the fact that users might have extended the schema we expect.

I'd be happy to tackle this in an incremental way: start with the simplest possible validation (check that the schema is identical to the one we'd generate) and then improve from there, especially as I'm afraid it would take quite some experience and several iterations to define exactly what kind of extensions are valid.

Some simple validation leniency which we could consider in a first iteration, if it's not too complex:

  • ignore indexes we don't know about

  • ignore types which we don't know about

  • ignore fields which we don't need

  • ignore analyzer definitions we don't use

For Field options such as indexable, stored, store-norms, etc.. we'd need to define a comparison scale for each option, to see if the option used by the server is at least as powerful as we expect.
Example: if we don't need norms, it's ok if the server enables norms. If we need the field to be stored, it's not ok if the option is not enabled on the server.

I'm totally fine to capture these in a follow up JIRA, as long as we document the limitations of our first version.

Sanne GrinoveroMay 23, 2016 at 9:52 AM

Solved by:

  • documenting the limitations

  • add a VALIDATE option to org.hibernate.search.elasticsearch.cfg.IndexSchemaManagementStrategy

  • add an option to export the schema generation script

Fixed

Details

Assignee

Reporter

Components

Sprint

Priority

Created April 29, 2016 at 10:13 AM
Updated November 29, 2016 at 1:06 AM
Resolved November 17, 2016 at 4:41 PM