AndDocIdSet makeDocIdSetOnAgreedBits() returns wrong values

Description

Depending on the DocIdSets list, AndDocIdSet fails to compute correct values in makeDocIdSetOnAgreedBits(). Please see the attached test cases, test_middle() fails and in my opinion it shouldn't.

Attachments

1

Activity

Show:

Hardy Ferentschik November 3, 2010 at 3:03 PM

Created issue in Lucene: LUCENE-2736

Hardy Ferentschik November 2, 2010 at 6:38 PM

The problem is that the DocIdSetIterator returned by SortedVIntList behaves differently when advance(int target) compared to the iterators returned by DocIdBitSet and OpenBitSet. Lets take the example from test test. Assume the following doc id are in the set 0, 5, 6 and 10. We get the DocIdSetIterator from the DocIdBitSet and we call next() until we point to the third element (iterator.docID() == 6. Now we call iterator.advance(6).
The algorithm in AndDocIdSet assumes that the advance call will return 6, basically not moving to another element. This is also how DocIdBitSet and OpenBitSet behave. DocIdSetIterator, however, returns 10.
The question is who is right. The DocIdSetIterator.advance javadoc says:

Advances to the first beyond the current whose document number is greater than or equal to target

It also shows some pseudo code:

Reading this documentation I would think SortedVIntList behaves correctly, but I think OpenBitSet is the more commonly used. I am surprised that no one has noticed this before.

On our AndDocIdSet side we can actually cater for this problem by comparing iterator.docID() == targetPosition before we call advance. If they match we don't have to call advance at all, because the iterator is already at the right position.

Need to follow up with the Lucene guys as well.

Sanne Grinovero October 27, 2010 at 12:00 PM

thanks a lot for spotting this and providing a testcase, very useful.

Fixed

Details

Assignee

Reporter

Components

Fix versions

Affects versions

Priority

Created October 24, 2010 at 8:49 PM
Updated September 11, 2011 at 6:19 PM
Resolved November 3, 2010 at 3:15 PM