Initialization options and APIs

Description

We should:

  1. Move the schema initialization setting such as the "lifecycle.strategy" from Elasticsearch to the mapper

  2. Expose an API to control the initialization programmatically, with more options such as reindexing.

Because:

  1. Executing initialization on startup is not always a valid option, especially when multiple instances of the applications are deployed, or when Elasticsearch may not be reachable on startup (containers). So we need an API.

  2. Schema initialization is almost always followed by reindexing, which can only be done at the mapper level. So it would be nice to have both in the same API.

  3. For near-zero-downtime indexing, we will have to be able to temporarily create separate indexes and perform reindexing on these. So it will be necessary to mix the initialization API and mass indexing API.

  4. In the future, we may have to take extra steps at the mapper level when initializing the application, for example dropping or creating Debezium connectors. This can only be handled at the mapper level.

The configuration options would probably be something like this:

hibernate.search.initialization.strategy = manual # Do nothing, expect the user to use APIs hibernate.search.initialization.strategy = create-or-validate # Default: create indexes if they don't exist, validate them if they exist hibernate.search.initialization.strategy = create hibernate.search.initialization.strategy = validate # For Lucene, do nothing except checking the index exists hibernate.search.initialization.strategy = update # For Lucene, create the index if it doesn't exist, do nothing if it exists hibernate.search.initialization.strategy = drop-and-create # Maybe add an option to wait for Elasticsearch to be reachable? # That can be moved to a different ticket. Maybe it would be better to move this to the Elasticsearch backend. hibernate.search.initialization.wait.enabled = true hibernate.search.initialization.wait.duration = PT5M # Maybe... though I doubt it's a good idea. Useful for tests, mostly. hibernate.search.initialization.reindexing.enabled = true # Defaults to false # Also, mostly for tests hibernate.search.cleanup.strategy = none hibernate.search.cleanup.strategy = drop

As to the API, I would imagine something like this:

// Use a workspace, so that in the future we'll be able to work on new indexes created in parallel of the existing one (see near-zero-downtime indexing: HSEARCH-3499) // Use a DSL to create the workspace, so that we can we can provide parameters when we do near-zero-downtime indexing try ( SearchInitializationWorkspace workspace = Search.mapping( emf ).initialization().start() ) { workspace.dropAndCreate(); workspace.reindex()... // offer a DSL similar to the mass indexer, or expose a mass indexer directly }

Or for near-zero-downtime indexing:

try ( SearchInitializationWorkspace workspace = Search.mapping( emf ).initialization() .nearZeroDownTime("some tag").start() ) { workspace.dropAndCreate(); workspace.reindex()... // offer a DSL similar to the mass indexer, or expose a mass indexer directly workspace.replaceOriginalIndexes(); // If this is not called, closing the workspace will roll back the changes }

Related to: https://hibernate.atlassian.net/browse/HSEARCH-3499#icft=HSEARCH-3499 https://hibernate.atlassian.net/browse/HSEARCH-3751#icft=HSEARCH-3751

Activity

Show:
Fixed

Details

Assignee

Reporter

Sprint

Fix versions

Priority

Created November 7, 2019 at 8:50 AM
Updated March 31, 2020 at 11:52 AM
Resolved March 6, 2020 at 7:51 AM

Flag notifications