Better tests for IndexReader passed to Filter to be consistent with latest writes

Description

Complex issue; I don't have time to boil it down to a simple test case, but will attempt to explain clearly:

Recent Index changes are not visible through the IndexReader passed to a Lucene Filter set on a FullTextQuery.

Example:

  1. I create a new Foo, id: 1, and persist it through entityManager.persist()

  2. I examine indexes with Luke, they are updated. Foo #1 is present.

  3. I perform a simple lucene search using Hibernate Search, Foo #1 is fetched.

Now, I run another query, this time using a query Filter that reads from the IndexReader passed to the getDocIdSet(IndexReader reader) method like so:

I would expect this to return 1, since I just persisted a Foo with ID 1. However, it returns 0.

If however, I check out an IndexReader instance from the searchFactory, and perform the same command like so:

Now the reader successfully returns 1, for the entity I had recently persisted.

Currently I work around this issue by manually checking out an IndexReader from the searchFactory, passing it to my Filter, and checking it in after the query runs. But, this is pretty clunky.

Shouldn't the Filter be getting the same current IndexReader?

Environment

None

Activity

Show:
Sanne Grinovero
December 1, 2012, 12:43 AM

I'm finally back from several trips and will inspect this tomorrow: looks quite bad.

Sanne Grinovero
January 1, 2013, 4:38 PM

Hi Clark,
I developed a functional test to verify this, could you please have a look at it:
https://github.com/Sanne/hibernate-search/blob/acb4d6e833e80f5b35b94bf09041bf624648422a/hibernate-search-engine/src/test/java/org/hibernate/search/test/filters/FreshReadersProvidedTest.java

Did you notice that the Filter instance is invoked multiple times? The filter needs to be applied on each sub-reader: you will get a reader instance to process for each segment in the index; considering that after each change you make to the index the set of segments changes, adding a new element like in your test means you'll be processing at least two segments.

Note that when you invoke

you're not operating on sub-readers but on a recursive IndexReader which includes all current segments.

Why: filtering needs to be applied on a per-segment basis to make caching more effective: each cached DocIdSet is kept around for the validity of each segment, so you won't invalidate all processing for each minimal change on the index.

Sanne Grinovero
January 1, 2013, 4:40 PM

Changed priority from critical as I think it's not really a bug - might need some clarifications on the docs?

Clark Duplichien
January 5, 2013, 12:17 AM

Thanks for the investigation and explanation, Sanne.
I was attempting to use a Lucene Filter to limit searched records to a subset of records whose IDs are or are not present in another field of the same index (but different record(s)). After your explanation, I follow the javadoc on org.apache.lucene.search.Filter better, and can understand why this use case wouldn't work out in a filter: the sub-reader filtering the current index segment would only be able to read the values for the referenced field within the current index segment.
Given this, I don't think there's any clarification to be made in the hibernate-search docs, either.

Sanne Grinovero
January 5, 2013, 12:27 AM

Hi Clark, thanks for confirming it's not a bug: feeling better

You're right, for your use case you would need a top-level IndexReader, so the approach to check one out yourself as you described in the JIRA description is good.

Fixed

Assignee

Sanne Grinovero

Reporter

Clark Duplichien

Labels

None

Suitable for new contributors

None

Pull Request

None

Feedback Requested

None

Components

Fix versions

Priority

Minor
Configure