Profile mass indexer to reduce number of index commits

Description

We need to profile the new mass indexer design and try to reduce the number of index commits. Performance of the filesystem based directory declined to a bug fix in Lucene (https://issues.apache.org/jira/browse/LUCENE-3418). Trying to reduce the amount of index commits (and hence fsync calls) might alleviate the performance loss.

Environment

None

Activity

Show:
Guillaume Smet
January 8, 2012, 2:32 PM

Hi Sanne,

Very promising numbers. Is there a snapshot we can test somewhere so that we also test your patch on standard hard drives?

FYI, Laurent did some further testing to understand the difference between his laptop and mine (see numbers above: Lucene 3.4 is 20x slower for him whereas it is only 3x slower on my laptop): this is due to write barriers being enabled by default even for ext3 on the newest Linux kernels - >= 3.1 - (and they also are enabled by default on ext4). I still have ext3 with an old kernel and that's why they are disabled on my laptop.


Guillaume

Sanne Grinovero
January 9, 2012, 12:38 AM

Hi Guillaume,
yes I've deployed a fresh 4.1.0-SNAPSHOT to the Maven repository (after Hardy merged this in master)

The numbers above refer to an ext4 partition using default settings - but if you want to try to experiment with the write barriers, these are the "tuned" settings I usually use on the SSD drive on the laptop:
rw,noatime,commit=60,max_batch_time=0,barrier=0,data=writeback

These should provide better performance (that's why I use them for coding), but I don't suggest using them in a production system: I'm not a filesystems expert, have just been playing with them especially because of the write barriers you mention.

Sanne Grinovero
February 1, 2012, 12:56 AM

Guillaume and/or Laurent, did you ever have time to verify the performance?

Laurent Almeras
February 1, 2012, 1:34 AM

I try your fix this night. Indexation lasts 12 seconds now, so I also notice better performance than with 4.0 and 3.x with your patch.

It's a very good job !

Sanne Grinovero
February 1, 2012, 1:47 AM

thank you!

Assignee

Sanne Grinovero

Reporter

Hardy Ferentschik

Labels

None

Suitable for new contributors

Yes, likely

Pull Request

None

Feedback Requested

None

Components

Fix versions

Affects versions

Priority

Blocker
Configure