We're updating the issue view to help you get more done. 

MassIndexer with an update mechanism

Description

A feature which is already discussed in https://forum.hibernate.org/viewtopic.php?f=9&t=1014063:
It would be great to see an update mechanism instead of an index wipe/rebuild. I got a lot of data (> 17 mio rows) which takes a long time (> 2 hours) to index, which is needed because I don't only manipulate the data with hibernate. While the massindexer is rebuilding the index, the search will miss some of the rows which are not indexed yet, which is not acceptable for me.

Instead of wiping the index and re-adding all rows, update only the changed ones (new, updated, deleted).

The current process is:
1) wipe out the index
2) Add again all entities from the database, loading and processing them with multiple threads

but 2) could be replaced by an update instead of an add operation. But then as a new 3) step, it should look for entries/rows which are deleted from the database and remove them from the index too.

the 3) phase is not top priority for me but would possibly lead other people to use this approach instead of the wipe/reindex procedure (for large datasets). Maybe it can be split, to have an operation only to update the index (without delete) and a second operation to delete already deleted data (on database) from the index.

The whole operation doesn't need to be as fast as the wipe/reindex operation.

Environment

None

Status

Assignee

Unassigned

Reporter

Marcel

Suitable for new contributors

None

Pull Request

None

Feedback Requested

None

Components

Fix versions

Priority

Major