Be more resilient to the absence of id in the _source

Description

I have upgraded the application from 5.6.0.Beta3 to 5.6.1.Final without reindexing. The index structure was generated by 5.6.0.Beta3 but read by 5.6.1.Final.

I received the following exception

The root cause is that the _source includes the id in 5.6.0.Final and uses it to retrieve the entity id whereas the 5.6.0.Beta3 did not include it and was expecting the id from the index structure proper.

I wonder a few things:

  • should we have thrown the inner exception of the bulk operations instead of swallowing (strict mapping error).

  • should we catch any EntityInfo with a null id early on during query and raise a more meaningful exception? That one is a rhetorical question, of course we should.

  • should we be more lenient toward older mappings and fallback to reading id the old fashion way?

Environment

None

Activity

Show:
Guillaume Smet
March 21, 2017, 10:11 AM

+1

Sanne Grinovero
March 21, 2017, 12:00 PM

+1 to ignore pre-release mappings.

But an NPE is not acceptable as we just can't rely on the format of the datastore, for example I'd hope we could support also the case of reading/search on data which was written by other tools.

I can see how some people might want to write on ES on their own, and try to forge an "Hibernate Search compliant schema" so to help with that we should be rather lenient in the data expectations.

Also we should be prepared to face situations in which people deploy multiple Hibernate Search powered applications targeting the same ES cluster; these independent applications might need to evolve independently, e.g. not to force them all to use the same Hibernate Search version.

Yoann Rodière
March 21, 2017, 12:15 PM

But an NPE is not acceptable [...] we should be rather lenient in the data expectations.

I gather you want us to return an EntityInfo with a null ID, and only throw an exception on the -orm side if we try to retrieve the associated entity? So that you can safely execute queries on documents without IDs, as long as you don't want to use the ID? Looks good to me, I'll do this if possible.

Also we should be prepared to face situations in which people deploy multiple Hibernate Search powered applications targeting the same ES cluster; these independent applications might need to evolve independently, e.g. not to force them all to use the same Hibernate Search version.

Having multiple application rely on the same ES cluster seems a reasonable expectation, but multiple applications using different Hibernate Search versions... Well, in my opinion, that's asking for trouble. Even if we do try to make things easy, there will be different bugs, and it might easily end up as an integration nightmare.
Anyway... We can try to be cautious about that, I guess. Not sure if this requires extra caution though, since we already don't want to break schemas between two micros, maybe even minors.

Emmanuel Bernard
March 21, 2017, 1:36 PM

During rolling upgrades, having different versions of Hiberna Search is a reasonable request. At least we should try to do it instead of accept that we can break thing de facto.

Yoann Rodière
March 21, 2017, 2:11 PM

During rolling upgrades, having different versions of Hiberna Search is a reasonable request. At least we should try to do it instead of accept that we can break thing de facto.

Ok, it's a valid use case, I didn't think about that. But as I mentioned, we don't "accept that we can break thing de facto"; we already try not to break things between micros, and probably even between minors. Yours was a very special case, since you upgraded from a beta.

Assignee

Yoann Rodière

Reporter

Emmanuel Bernard

Labels

None

Suitable for new contributors

None

Feedback Requested

None

Fix versions

Affects versions

Priority

Major
Configure