Vector search

Description

Implement vector search in Hibernate Search, giving users the ability to index vectors produced using AI (e.g. Large Language Models) and to search for documents whose vector is closest to a given vector (k-nearest-neighbors).

  1. Ability to index vectors of floats, with a @VectorField(?) annotation.

    1. Must handle at least List<Float> + float[] (this will likely require adding default bridges to org.hibernate.search.mapper.pojo.bridge.mapping.impl.BridgeResolver.Builder#addDefaults; registering a default bridge for List<Float> might be challenging, feel free to ping Yoann about that)

    2. Must accept a (static) parameter to specify the number of dimensions.

    3. Must ignore container extractors by default (we need to pass the full List to the backend, not a sequence of floats).

  2. Ability to include a k-nearest-neighbors (KNN) predicate in search queries, to find the “k” documents with a given field holding the most similar vectors when compared to a given vector.

    1. May accept a parameter to specify the similarity function (Lucene apparently offers several?)

We can accept the following limitations:

  • The feature is limited to the Lucene backend for a start.

  • The feature is incubating/experimental for a start.

  • We don’t provide any integration with AI libraries: we only deal with vectors and it’s up to users to transform text/images/sound into vectors.

  • During indexing, vectors must be single-valued.

  • During indexing, vectors cannot be in a nested document.

  • KNN search is incompatible with paging.

  • If we must (but preferably let’s avoid this limitation): KNN search is incompatible with other predicates.

100% Done
Loading...

Activity

Show:

Details

Assignee

Reporter

Fix versions

Epic Status

To Do
Created September 18, 2023 at 2:05 PM
Updated February 12, 2024 at 12:48 PM