Hibernate Search Indexing Speed Decreasing

Description

I am having two Entities in my Code and DB. one Entity is having 0.2 Million Records and the other one is having 2.6 Million Records. I am using MassIndexer StartandWait() Method and Start() Method. When I start to Index my Entity with Less records i.e, 0.2 million it indexes them fairly quick at a constant Speed Documents/Second but when it starts indexing the other Entity with 2.6 million records the Indexing Speed decreases by miles within minutes and CPU Consumption on my windows show 100 % and Memory also show 100% the speed drops 2000 Documents/second to 8 Documents/second and never looks like it will finish.

this is my code

try
{
logger.info("Indexing Started============================");
SimpleIndexingProgressMonitor progressMonitor = new SimpleIndexingProgressMonitor();
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
MassIndexer massIndexer = fullTextEntityManager.createIndexer()
.purgeAllOnStart(purgeAllOnStart)
.progressMonitor(progressMonitor)
.typesToIndexInParallel(typesToIndexInParallel)
.batchSizeToLoadObjects(batchSizeToLoadObjects)
.threadsForSubsequentFetching(8)
.idFetchSize(idFetchSize)
.threadsToLoadObjects(threadsToLoadObjects)
.cacheMode(CacheMode.IGNORE);

try {
if (!async) {
massIndexer.startAndWait();
} else {
massIndexer.start();
}
} catch (InterruptedException e) {
logger.error("mass reindexing interrupted: " + e.getMessage());
} finally {
fullTextEntityManager.flushToIndexes();
}

logger.info("Indexing completed============================");
} catch(Exception e) {
logger.info(e.getMessage());
e.printStackTrace();

}

See if I can get this fixed and indexed on constant speed and is there any issue with Memory and CPU Usage overtime, so whats could be the fix and way around?

How can I avoid this thing without getting the indexing speed decreased?

Environment

None

Activity

Show:
Yoann Rodière
September 9, 2020, 3:19 PM

I'd more or less expect the indexing speed to decrease as the index grows in size, but not that much. There's something wrong, now the question is where the bottleneck is... It could be the index, the database, or even the JVM. Let's try to find out.

A few questions first:

  • Do you use the traditional Lucene backend or the Elasticsearch backend?

  • Can you confirm you don't use JMS/JGroups?

  • What are the values of the parameters you're passing to the mass indexer?

  • Did you check that it really was that slow, and it was not simply a bug in the reported speed? You can run search queries while it's indexing to see how many documents are available.

  • Did you optimize indexing configuration in any way?

  • Considering the size of your index, have you considered sharding? Does it help at all in your case?

  • Have you tried setting hibernate.search.default.worker.backend to blackhole? This should replace Lucene with a stub, dropping all documents. Obviously it will be much faster, but do you see a huge drop in indexing speed with this configuration too?

Also, is there any chance you could create a reproducer based on our test case templates, with anonymized or randomly generated data?

Sakhawat Naqvi
September 10, 2020, 2:27 PM

Hi Yoann,

Thanks for reaching out. these are the answers you were looking for

Do you use the traditional Lucene backend or the Elasticsearch backend? Answer: Lucene Engine Backend Can you confirm you don't use JMS/JGroups? Answer: No What are the values of the parameters you're passing to the mass indexer? Answer: We are reading them from properties file so these are the values, we tried tweaking them but nothing happened significantiliy hibernate_purge_all_on_start = true hibernate_types_to_index_in_parallel = 2 hibernate_batch_size_to_load_objects = 1000 hibernate_id_fetch_size = 150 hibernate_threads_to_load_objects = 10 Did you check that it really was that slow, and it was not simply a bug in the reported speed? You can run search queries while it's indexing to see how many documents are available. Answer: Looks like all of the CPU starts requesting the data from database and database gets overloaded with work. Did you optimize indexing configuration in any way? Answer: we tried to set purgeAll to true Considering the size of your index, have you considered sharding? Does it help at all in your case? Answer: We tried Sharding but that doesnot help. Have you tried setting hibernate.search.default.worker.backend to blackhole? This should replace Lucene with a stub, dropping all documents. Obviously it will be much faster, but do you see a huge drop in indexing speed with this configuration too? Answer: Yes, we did but again this doesnot work our server is on cloud.

Yoann Rodière
September 10, 2020, 3:04 PM

From what you're saying, it looks like the problem is on the database side? There's not much Hibernate Search can do to optimize that, unfortunately... It's mostly on you to experiment and find the right settings.

Here's what you should probably look into:

  • Reducing the amount of parallel work you ask from the database: sometimes, less is faster. Maybe lower "threadsToLoadObjects" and set "typesToIndexInParallel" to just 1? If your indexed entities are large (many @IndexedEmbedded associations) maybe also set "batch_size_to_load_objects" to a lower value.

  • Raising the ID fetch size. I'd expect you'll get (slightly) better results if the id fetch size is significantly higher than the "batch_size_to_load_objects".

  • If your indexed entities are large (many @IndexedEmbedded associations), you may be running into N+1 query issues. Try Hibernate ORM's hibernate.default_batch_fetch_size setting. For example set it to 16 and see if it improves things.

  • Make sure there are database indexes on foreign keys involved in your associations, so that joins are faster.

Beyond that, I'm afraid it's more a matter of optimizing database access.

Assignee

Unassigned

Reporter

Sakhawat Naqvi

Suitable for new contributors

Yes, likely

Pull Request

None

Feedback Requested

None

Components

Affects versions

Priority

Blocker
Configure