Distance computation is inconsistent between the Lucene and the ElasticSearch backend

Description

When fixed, adapt the last commit of https://github.com/hibernate/hibernate-search/pull/1136

When querying on distance to reference coordinates, and when some document misses a value for the location field:

  • the Lucene backend computes the distance between the reference coordinates and (0,0)

  • the ElasticSearch backend simply returns a distance of Double.MAX_VALUE (~1*10^308)

This is illustrated by this commit, where I had to adapt a test in order to take into account the inconsistency between the two backends: https://github.com/hibernate/hibernate-search/pull/1136/commits/e41b842704b581b5a93ba24a6020436d85573378

I think we should make the behavior consistent.
IMO, the ElasticSearch backend's behavior is more sensible. Changing the Lucene backend's behavior could be considered as a breaking change. On the other hand, changing the behavior of ES might prove difficult.

Activity

Sanne GrinoveroSeptember 12, 2016 at 3:11 PM

Understood now, thanks for all details. Yes I agree, let's fix the behaviour we have on our Lucene backend.

Yoann RodièreSeptember 12, 2016 at 1:57 PM

Actually, there are some corner cases where documents with no coordinates (assumed to be at (0,0) with the current Lucene impl) may appear in the results. When only sorting and not filtering, for instance, or when sorting and doing loose filtering around Africa (the coordinates (0,0) are very close to Africa). This would result in rather confusing results.

No, really, the current behavior feels really weird. I think it's at least worth a PR ; we can always reject it if still has strong concerns.

Guillaume SmetSeptember 5, 2016 at 3:34 PM

Well, it's inconsistent because the distance returned is not the same.

See this commit where Yoann had to tweak the test so that it works with Lucene and Elasticsearch: https://github.com/hibernate/hibernate-search/pull/1136/commits/9f76628900a65162630b2d00adef95ac34e7517e

Personally, I find Elasticsearch behavior more correct and I'm for improving the consistency between both backends.

Sanne GrinoveroSeptember 5, 2016 at 2:40 PM

I'm not sure if I fully understood the problem. The inconsistency is only triggered by missing values (like a null in either Longitude or Latitude) right?

If so, it doesn't seem particularly inconsistent as neither backend is going to include that Location in the results; in the ES case it could be found if one sets a very large search range (unless MAX_VALUE is treated as a special case).

Yoann RodièreAugust 18, 2016 at 5:18 PM

Fixed

Details

Assignee

Reporter

Components

Sprint

Affects versions

Priority

Created August 12, 2016 at 2:05 PM
Updated November 29, 2016 at 1:06 AM
Resolved November 23, 2016 at 3:11 PM