Create a proper API for sorting definitions
Description
depends on
is duplicated by
is followed up by
Activity

Yoann Rodière August 26, 2016 at 9:49 AM
Thanks. It seems the PR there will fulfill your requirements: https://github.com/hibernate/hibernate-search/pull/1139
Emmanuel Bernard August 26, 2016 at 9:15 AM
I think I've caught up with the status, let me know otherwise. And thanks for the relay
Emmanuel Bernard August 26, 2016 at 9:14 AM
Where is the API going
The most likely scenario is that QueryBuilder
will be made generic in 6 to output:
Lucene
Query
or Elasticsearch JSON when createQuery is calledLucene
Sort
or Elasticsearch JSON when createSort is calledbe
byNative(SortField)
for Lucene andbyNative(String)
for ES
It's not rebuilt from the ground up, just generified to meet our needs. We did explore in Rome something that integrate the DSL and the createQuery
but it was not destroying the DSL, just adding more leaves options if I recall.
Spatial
Ok , let's keep withComputeMethod
for later just making usre the DSL could be expanded.
On byDistanceFromSpatialQuery(Query)
I'm fine with dropping it.
Native
I still feel native should be in with a oveloaded method byNative(SortField)
and byNative(String)
, the former only usable with Lucene, the latter only with Elasticsearch. Yes using a custom SortField
to keep the JSON string payload seems fine to me for the time being. In 6, we will be able to introduce a generic type bedending ont he backend targeted. So yes it looks hacky but that's temporary. Make it experimental if you have concerns.
Do we want to support sorting on unmapped fields
Doesn't native gives us a way out?
Anyways, I'm fine with restricting for now.

Nicolas Helleringer August 23, 2016 at 8:13 AM
Elasticsearch lets you sort by distance between a point and a coordinate in a document through several methods. SLOPPY_ARC (default), ARC and PLANE. Do you think that is implementable directly via the Lucene Sort/SortField classes like you did DistanceSortField / DistanceComparator.
Is that worth it?
And what is the current implementation doing?
If I do understand the Elasticsearch doc well, ARC is a mathematically correct spherical distance, SLOPPY_ARC is a "faster" computation method and PLANE is planar distance with local projection around the center of the search.
I do think that trying to save some CPU cycle here is overkill and not needed in most cases but I think it is globally doable to have a support for multiple distance computation methods in Hibernate Search.
The current implementation is doing accurate spherical computation.
I need to grab some time to go deeper into Elasticsearch and Hibernate Search code to compare if you need to.

Yoann Rodière August 12, 2016 at 10:20 AM
Where is this API going?
Our ultimate goal is to be more independent of Lucene internal API. Currently, most of our machinery is based on Lucene API (we build our index with Lucene Documents for instance) and this has to change.
It's a Search 6 thing though.
Ok. So this API is bound to disappear in its current form.
Is this API going to support native sorts?
Native sorts are important. The fact is that Elasticsearch brings a lot of power to the user and it is really important to allow the user to use this power.
I completely agree with that. I was just saying that maybe we should try to provide a way to chain DSL-built sorts with native sorts (through another API, provided only by the ES module) instead of adding backend-dependent methods to the DSL.
But if the API is to be rebuilt from the ground up in 6, well, I guess I can live with that.
Should we keep andByIndexOrder/andByScore
+1 to keep them. I also thought they might be useful.
Will do. I will also add a way to chain an andByField
after a byScore
, just in case.
Should we keep in(Unit)
Maybe we could simply provide a fixed value for Elasticsearch (probably meter) and document that?
In the long term, we will probably need to be able to define backend specific options but it might not be a priority for now.
Agreed, a fixed value seems the way to go. I'll check out what the unit for our FullTextQuery.SPATIAL_DISTANCE is, but it seems to be kilometers (org.hibernate.search.query.engine.impl.DocumentExtractorImpl.extract(int)
=> org.hibernate.search.query.engine.impl.QueryHits.spatialDistance(int)
=> org.hibernate.search.spatial.impl.DistanceCollector.HitEntry.getDistance(Point)
=> org.hibernate.search.spatial.impl.Point.getDistanceTo(double, double)
). A bit odd, but changing it would be an API break, so let's be consistent...
Is the byDistanceFromSpatialQuery(Query) method making sense
It's a useful shortcut. If we can provide a useful error message if the user is not providing a spatial query, I think it's worth it. If it's going to be cryptic, probably not.
Not sure it's really a shortcut, that was my point. Without it, we'd have this kind of code:
And with it, that kind of code:
So we gained one line of code. Out of a dozen. And we lost in clarity, though that's a matter of opinion.
Anyway, I'll see if we can at least implement this cleanly.
Do we want to support sorting on unmapped fields
I think it can be considered an acceptable limitation to always sort on common fields. I wouldn't enforce it though so that it continues to work as before for Lucene.
It would be something to keep in mind for Search 6.
Ok to support only common fields. But as to "work as before", the DSL didn't exist "before", so it will only mean that users can provide their own SortField
through the byNative
methods.
Thanks!
Details
Assignee
Yoann RodièreYoann RodièreReporter
Sanne GrinoveroSanne GrinoveroComponents
Sprint
NoneFix versions
Priority
Major
Details
Details
Assignee

Reporter

Especially useful for the DSL, but not exclusively as we apply sorting as an independent component from the query definition.
Important to possibly decouple from the Lucene engine in local mode, and also to abstract the user application code from the Lucene API changes.
However while abstraction has some benefits, let's make sure that no functionality is taken away from people having deep Lucene understanding.