Complex issue; I don't have time to boil it down to a simple test case, but will attempt to explain clearly:
Recent Index changes are not visible through the IndexReader passed to a Lucene Filter set on a FullTextQuery.
I create a new Foo, id: 1, and persist it through entityManager.persist()
I examine indexes with Luke, they are updated. Foo #1 is present.
I perform a simple lucene search using Hibernate Search, Foo #1 is fetched.
Now, I run another query, this time using a query Filter that reads from the IndexReader passed to the getDocIdSet(IndexReader reader) method like so:
I would expect this to return 1, since I just persisted a Foo with ID 1. However, it returns 0.
If however, I check out an IndexReader instance from the searchFactory, and perform the same command like so:
Now the reader successfully returns 1, for the entity I had recently persisted.
Currently I work around this issue by manually checking out an IndexReader from the searchFactory, passing it to my Filter, and checking it in after the query runs. But, this is pretty clunky.
Shouldn't the Filter be getting the same current IndexReader?
I'm finally back from several trips and will inspect this tomorrow: looks quite bad.
I developed a functional test to verify this, could you please have a look at it:
Did you notice that the Filter instance is invoked multiple times? The filter needs to be applied on each sub-reader: you will get a reader instance to process for each segment in the index; considering that after each change you make to the index the set of segments changes, adding a new element like in your test means you'll be processing at least two segments.
Note that when you invoke
you're not operating on sub-readers but on a recursive IndexReader which includes all current segments.
Why: filtering needs to be applied on a per-segment basis to make caching more effective: each cached DocIdSet is kept around for the validity of each segment, so you won't invalidate all processing for each minimal change on the index.
Changed priority from critical as I think it's not really a bug - might need some clarifications on the docs?
Thanks for the investigation and explanation, Sanne.
I was attempting to use a Lucene Filter to limit searched records to a subset of records whose IDs are or are not present in another field of the same index (but different record(s)). After your explanation, I follow the javadoc on org.apache.lucene.search.Filter better, and can understand why this use case wouldn't work out in a filter: the sub-reader filtering the current index segment would only be able to read the values for the referenced field within the current index segment.
Given this, I don't think there's any clarification to be made in the hibernate-search docs, either.
Hi Clark, thanks for confirming it's not a bug: feeling better
You're right, for your use case you would need a top-level IndexReader, so the approach to check one out yourself as you described in the JIRA description is good.