Query time join
Description
is duplicated by
relates to
Activity

Yoann Rodière June 19, 2017 at 10:45 AM
Resurecting this... There's a great thing with joins: you can search for elements matching multiple conditions in collections properties. For example search for all groups that have a post with a title containing "lucene" and a body containing "solr". Right now with @IndexedEmbedded, it's not possible (see this question on stackoverflow for instance).
So I think we definitely need something in the DSL. Also, having a dedicated feature in the DSL would allow for arbitrary joins, which can be useful from time to time.
We could also add a way to do simpler join queries with indexing metadata, but I think it's a separate subject. It may be addressed as part of for instance, since this seems very close to Elasticsearch's `nested` datatype (though not exactly the same): https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.htm
I'm moving this to 6 because I think we definitely need to at least check that it will be doable in 6 (especially with respect to HSEARCH-2498). I would even be tempted to do a 5.9 just for this feature, but we can't keep postponing 6 forever...

Marc Schipperheyn February 11, 2015 at 2:11 AM
So, QueryTimeJoin basically allows you to filter a resultset based on a selection from a specific index that may be different than the one you're querying.
The way it works is that you basically select a single field based on a query and use that to filter on a field in the query you are executing. So in SQL terms, it can be seen as a WHERE myId IN (select myId from ) type query
One thing to realize is that due to current limitations in this Lucene module, the fields that are used to execute the filter have to be text fields.
In terms of API, perhaps this could be defined as such

Marc Schipperheyn April 5, 2014 at 1:25 PM
An interesting article published about the subject: http://blog.seecr.nl/2014/02/24/a-faster-join-for-solrlucene/

Sanne Grinovero February 21, 2014 at 9:46 PMEdited
Hi Marc, agreed this looks like awesome to have.
Having it into the DSL is for sure a way, but I'm wondering if it could be defined on the indexing metadata? we could produce the join query transparently based on the field names.
I'll flag it as 5.1: we have many things on the roadmap already, and I don't think we'll be able to make it earlier. I'd rather have a quick 5.0 than release in ages, but we can start thinking about this in the scope of the internal refactorings.

Marc Schipperheyn February 19, 2014 at 11:40 AM
In Lucene 4.x this has now become standard and performant. I would recommend adding this functionality through the DSL and adding it to the 5.0 roadmap.
Details
Details
Assignee
Reporter

Lucene 3.6 introduces the notion of "Query Time Join": a way to relate Documents from different indexes and filter content and retrieve fields. This approach comes at a runtime costs as an extra pass is involved in processing the query.
The idea is basically that if you search on e.g.
Post
instances and you need the photo of theUser
that is part of thePost
, you can keep this information separate and retrieve theUser
on the fly. This way you can ensure that fields that change in theUser
don't require a re-indexing of all the relatedComments
- http://www.searchworkings.org/blog/-/blogs/412000Query time joining in Lucene is pretty straight forward, and entirely encapsulated in
JoinUtil.createJoinQuery
. It requires the following arguments:fromField - The entity field to join in the entity being queried: e.g. user.id
toField - The entity field in the related index to join on: e.g. id.
fromQuery - The query executed to collect the from terms.
fromSearcher - The search on where the fromQuery is executed.
multipleValuesPerDocument - Whether the
fromField
contains more than one value per document (multi-valued field). If this option is set totrue
the from terms can be collected in a more efficient manner.Since this doesn't require indexing changes and just affects what is returned, it can simply be implemented as an extension to the
QueryBuilder
.I'm not sure at this point but I believe that query joining doesn't actually retrieve the related document. Which would be a nice feature also.