Details
Assignee
UnassignedUnassignedReporter
Yoann RodièreYoann RodièreFix versions
Epic Status
To Do
Details
Details
Assignee
Unassigned
UnassignedReporter
Yoann Rodière
Yoann RodièreFix versions
Epic Status
To Do
Created September 18, 2023 at 2:05 PM
Updated February 12, 2024 at 12:48 PM
Implement vector search in Hibernate Search, giving users the ability to index vectors produced using AI (e.g. Large Language Models) and to search for documents whose vector is closest to a given vector (k-nearest-neighbors).
Ability to index vectors of floats, with a
@VectorField
(?) annotation.Must handle at least
List<Float>
+float[]
(this will likely require adding default bridges toorg.hibernate.search.mapper.pojo.bridge.mapping.impl.BridgeResolver.Builder#addDefaults
; registering a default bridge forList<Float>
might be challenging, feel free to ping Yoann about that)Must accept a (static) parameter to specify the number of dimensions.
Must ignore container extractors by default (we need to pass the full
List
to the backend, not a sequence of floats).Ability to include a k-nearest-neighbors (KNN) predicate in search queries, to find the “k” documents with a given field holding the most similar vectors when compared to a given vector.
May accept a parameter to specify the similarity function (Lucene apparently offers several?)
We can accept the following limitations:
The feature is limited to the Lucene backend for a start.
The feature is incubating/experimental for a start.
We don’t provide any integration with AI libraries: we only deal with vectors and it’s up to users to transform text/images/sound into vectors.
During indexing, vectors must be single-valued.
During indexing, vectors cannot be in a nested document.
KNN search is incompatible with paging.
If we must (but preferably let’s avoid this limitation): KNN search is incompatible with other predicates.