TokenStream contract violation during serialization of index on slave node in a JMS cluster
Description
Activity

Steffen Terheiden October 7, 2016 at 2:47 PM
Thanks for your replies so far.
I will test if everything works for me now when I have the time

Yoann Rodière October 7, 2016 at 7:23 AM
Hello ,
Sanne just released 5.5.5.Final, which includes the fix: http://in.relation.to/2016/10/06/TripleHibernateSearchRelease/
Cheers.

Yoann Rodière October 6, 2016 at 1:44 PM
Hello ,
I have good news, and less good news. Good news is, we completely agree on the contract violation, the fix is undergoing peer-review and will be released soon.
Less good news is, we can't reproduce the issue. In fact, this code should never be called and is not maintained: it probably was forgotten in a previous refactoring, but it is theoretically useless, since text analysis is never performed before serialization (so there never is a token stream to serialize in the first place).
What it means right now is that you can try out the fix, but you'll probably run into other problems anyway (HSEARCH-2383 comes to mind). If you do run into other issues, we'll need a test case to help you, or at least any information on exotic stuff you might be doing in your application (any use of org.apache.lucene.document.Field.setTokenStream(TokenStream)
or org.apache.lucene.document.Field.setReaderValue(Reader)
in particular).
You can find test case templates here: https://github.com/hibernate/hibernate-test-case-templates/tree/master/search
In your case, the hibernate-search-lucene project is what you need. If you do decide to create a test case, feel free to ask us for help!

Steffen Terheiden July 26, 2016 at 7:52 AM
Thanks for your quick response. The error only occured on slave nodes as described.
Here the hibernate search configuration in my hibernate.cfg.xml file:
And the used Analyzer (JAFDefaultAnalyzer):
I hope this is what you mentioned as the example Analyzer configuration.

Sanne Grinovero July 25, 2016 at 3:25 PM
That looks like a bug indeed. Thanks for the very good report!
I'm surprised that we have tests serializing tokenized streams, yet noone noticed. Would you have an an example Analyzer configuration which we could use to introduce a new test?
Details
Assignee
Yoann RodièreYoann RodièreReporter
Steffen TerheidenSteffen TerheidenComponents
Fix versions
Affects versions
Priority
Major
Details
Details
Assignee

Reporter

While setting up the JMS replication for the Lucene index via Hibernate Search i came across an error that complains about TokenStream contract violation (see the stack trace at the end of the description).
After a research in the web i found out, that this is usually caused by the update of the TokenStream API that now requires a defined workflow to use it ( https://lucene.apache.org/core/5_3_0/core/org/apache/lucene/analysis/TokenStream.html ).
After investigating all possible problems in my application, i reviewed the source code of the serialization classes in Hibernate Search. So i discovered that you use the TokenStream in your class org.hibernate.search.indexes.serialization.impl.CopyTokenStream but don't comply with the defined workflow. In your method createAttributeLists you need to insert a short line to reset the "input"-TokenStream.
Testing my idea i added the method call and all works as expected. Below my code of the createAttributeList method in org.hibernate.search.indexes.serialization.impl.CopyTokenStream:
Is there any hidden purpose that you don't use the reset() call or is it just a bug?
Below the mentioned stack trace:
org.hibernate.search.exception.SearchException: HSEARCH000083: Unable to serialize List<LuceneWork>
at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:109)
at org.hibernate.search.backend.jms.impl.JmsBackendQueueTask.run(JmsBackendQueueTask.java:61)
at org.hibernate.search.backend.jms.impl.JmsBackendQueueProcessor.applyWork(JmsBackendQueueProcessor.java:88)
at org.hibernate.search.indexes.spi.DirectoryBasedIndexManager.performOperations(DirectoryBasedIndexManager.java:112)
at org.hibernate.search.backend.impl.WorkQueuePerIndexSplitter.commitOperations(WorkQueuePerIndexSplitter.java:49)
at org.hibernate.search.backend.impl.BatchedQueueingProcessor.performWorks(BatchedQueueingProcessor.java:81)
at org.hibernate.search.backend.impl.PostTransactionWorkQueueSynchronization.flushWorks(PostTransactionWorkQueueSynchronization.java:114)
at org.hibernate.search.backend.impl.TransactionalWorker.flushWorks(TransactionalWorker.java:165)
at org.hibernate.search.impl.FullTextSessionImpl.flushToIndexes(FullTextSessionImpl.java:87)
at com.sobis.jaf.JAFApplication.createIndexFor(JAFApplication.java:919)
at com.sobis.jaf.JAFApplication.createIndexAndVerify(JAFApplication.java:820)
at com.sobis.jaf.JAFApplication.createIndex(JAFApplication.java:796)
at com.sobis.jaf.JAFApplication.createIndex(JAFApplication.java:672)
at com.sobis.jaf.JAFApplication$1.performAction(JAFApplication.java:486)
at com.sobis.jaf.services.thread.JAFThread.run(JAFThread.java:71)
Caused by: java.lang.IllegalStateException: TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.
at org.apache.lucene.analysis.Tokenizer$1.read(Tokenizer.java:111)
at org.apache.lucene.analysis.core.KeywordTokenizer.incrementToken(KeywordTokenizer.java:68)
at org.hibernate.search.indexes.serialization.impl.CopyTokenStream.createAttributeLists(CopyTokenStream.java:85)
at org.hibernate.search.indexes.serialization.impl.CopyTokenStream.buildSerializableTokenStream(CopyTokenStream.java:39)
at org.hibernate.search.indexes.serialization.spi.LuceneFieldContext.getTokenStream(LuceneFieldContext.java:137)
at org.hibernate.search.indexes.serialization.avro.impl.AvroSerializer.addFieldWithTokenStreamData(AvroSerializer.java:281)
at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.serializeField(LuceneWorkSerializerImpl.java:237)
at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.serializeDocument(LuceneWorkSerializerImpl.java:175)
at org.hibernate.search.indexes.serialization.impl.LuceneWorkSerializerImpl.toSerializedModel(LuceneWorkSerializerImpl.java:97)
... 14 more