Explicit support for indexing free-form (dynamic) entities

Description

It's possible today to index free-form objects but it is a well kept secret, and looks more like a hackish workaround and lacks some explicit API help.

Today (i.e. with version 4.3.x) this is the possible solution:

  • Indexing
    For indexing, you need a "placeholder object" which has a rather
    smart custom @ClassBridge.
    Analyzers could also be custom, as Numeric types, etc..
    ModeShape uses this approach, and we added some tests in Search Engine as example as Hibernate Search users occasionally also asked for more dynamic properties.

  • Queries
    If you have a purely dynamic schema - likely as you're building a framework on top of Hibernate Search - you would use sich placeholders exclusively.
    Since you have a single placeholder object, you'll only be able to
    target this type and will return lists of this type. This means that
    user types are built on top of this type as an additional layer:
    probably a special field to represent which protostream schema we're
    referring to, the HQL query transformer will then either add a
    restriction or enable a filter.
    There is room for improvement in the lower level details, for example
    by exposing some control over the usage of custom field
    org.hibernate.search.ProjectionConstants.OBJECT_CLASS .
    I guess also the results Loader could take advantage of this, but
    doesn't seem an urgently needed patch either.

  • Query DSL + metadata API
    These APIs don't provide any useful helper for purely dynamic input. Needs to be explored.

  • Tika and filesystem-stored documents
    To be kept in mind as interesting input examples

Activity

Show:

Yoann Rodière June 9, 2017 at 8:37 AM

Closing as duplicate of HSEARCH-1401, since that other ticket is a bit more documented.

Hardy Ferentschik September 12, 2013 at 1:33 PM

Can we add some examples to this issue? How do these "dynamic entities" look like? How are the fields configured and how do queries look like?

Emmanuel Bernard July 31, 2013 at 4:34 PM
Edited

The goal is is to offer:

  • an abstraction on top of object navigation (via reflection) to support alternate data structure

  • open the API to reference entities besides their Class (query DSL, programmatic mapping are two examples)

Sanne Grinovero July 31, 2013 at 3:53 PM
Edited

This is an example of how it can be achieved today:
https://github.com/hibernate/hibernate-search/blob/277449eb02a367d76e32f7fd92ef9c57fa6a1f0c/engine/src/test/java/org/hibernate/search/test/bridge/PropertiesExampleBridgeTest.java

https://github.com/hibernate/hibernate-search/blob/277449eb02a367d76e32f7fd92ef9c57fa6a1f0c/engine/src/test/java/org/hibernate/search/test/bridge/DynamicIndexedValueHolder.java

https://github.com/hibernate/hibernate-search/blob/277449eb02a367d76e32f7fd92ef9c57fa6a1f0c//engine/src/test/java/org/hibernate/search/test/bridge/MultiFieldMapBridge.java

Note that the indexing and query example looks like particularly clumsy as the test is made in engine, so without the ORM syntactic sugar (not the one from Infinispan Query).

Of course it could be made smarter, for example the DynamicIndexedValueHolder could use multiple Properties defined: some for simple text, some for numbers, some other option to carry a specific boosting option as a value.

I don't know how we would like this to evolve exactly in 5.0 and beyond, that needs to be discussed. The point is that CapeDwarf uses it in this more flexible way, as does Infinispan via remote queries (even from clients in different languages like C#, Ruby or Phyton), as does ModeShape. We need to inspect their use cases, but some are yet to be defined.

Hardy Ferentschik July 31, 2013 at 3:41 PM

So how does this "rather smart" class bridge look like? And how to queries look like?

Duplicate

Details

Assignee

Reporter

Labels

Components

Priority

Created July 31, 2013 at 3:37 PM
Updated October 30, 2017 at 1:52 PM
Resolved June 9, 2017 at 8:37 AM