We need to profile the new mass indexer design and try to reduce the number of index commits. Performance of the filesystem based directory declined to a bug fix in Lucene (https://issues.apache.org/jira/browse/LUCENE-3418). Trying to reduce the amount of index commits (and hence fsync calls) might alleviate the performance loss.
Very promising numbers. Is there a snapshot we can test somewhere so that we also test your patch on standard hard drives?
FYI, Laurent did some further testing to understand the difference between his laptop and mine (see numbers above: Lucene 3.4 is 20x slower for him whereas it is only 3x slower on my laptop): this is due to write barriers being enabled by default even for ext3 on the newest Linux kernels - >= 3.1 - (and they also are enabled by default on ext4). I still have ext3 with an old kernel and that's why they are disabled on my laptop.
yes I've deployed a fresh 4.1.0-SNAPSHOT to the Maven repository (after Hardy merged this in master)
The numbers above refer to an ext4 partition using default settings - but if you want to try to experiment with the write barriers, these are the "tuned" settings I usually use on the SSD drive on the laptop:
These should provide better performance (that's why I use them for coding), but I don't suggest using them in a production system: I'm not a filesystems expert, have just been playing with them especially because of the write barriers you mention.
Guillaume and/or Laurent, did you ever have time to verify the performance?
I try your fix this night. Indexation lasts 12 seconds now, so I also notice better performance than with 4.0 and 3.x with your patch.
It's a very good job !