Parallel service lookup might fail to find the service


Spotted by when run on Elasticsearch, it seems sone threads will attempt to use a LuceneHSQuery instead:




Sanne Grinovero
April 25, 2016, 12:22 PM

Updating the title to better reflect the issue, now that we know why this happened.

As explained on github:

Running elasticsearch queries in parallel would attempt to lookup the Query Translator service in parallel.

Parallel service lookups would attempt to iterate on the ORM's ClassloaderService, which doesn't enumerate service implementors in a threadsafe way.

The result is that "some" threads would fail to find the translator service implementation, fall back to "null", which is then handled by the query execution code as a "no need for translation" and it eventually attempts to run the Lucene query - eventually it fails to run the Lucene query on the ES IndexManager.

Seems like an old bug, specifically in the ORM ClassLoader wrapper, although all consumers of it normally wouldn't lookup a service at runtime: using the service lookup sequentially at bootstrap as we normally do doesn't trigger the issue.



Sanne Grinovero


Sanne Grinovero