Implement vector search in Hibernate Search, giving users the ability to index vectors produced using AI (e.g. Large Language Models) and to search for documents whose vector is closest to a given vector (k-nearest-neighbors).
Ability to index vectors of floats, with a @VectorField(?) annotation.
Must handle at least List<Float> + float[] (this will likely require adding default bridges to org.hibernate.search.mapper.pojo.bridge.mapping.impl.BridgeResolver.Builder#addDefaults; registering a default bridge for List<Float> might be challenging, feel free to ping Yoann about that)
Must accept a (static) parameter to specify the number of dimensions.
Must ignore container extractors by default (we need to pass the full List to the backend, not a sequence of floats).
Ability to include a k-nearest-neighbors (KNN) predicate in search queries, to find the “k” documents with a given field holding the most similar vectors when compared to a given vector.
May accept a parameter to specify the similarity function (Lucene apparently offers several?)
We can accept the following limitations:
The feature is limited to the Lucene backend for a start.
The feature is incubating/experimental for a start.
We don’t provide any integration with AI libraries: we only deal with vectors and it’s up to users to transform text/images/sound into vectors.
During indexing, vectors must be single-valued.
During indexing, vectors cannot be in a nested document.
KNN search is incompatible with paging.
If we must (but preferably let’s avoid this limitation): KNN search is incompatible with other predicates.
Implement vector search in Hibernate Search, giving users the ability to index vectors produced using AI (e.g. Large Language Models) and to search for documents whose vector is closest to a given vector (k-nearest-neighbors).
Ability to index vectors of floats, with a
@VectorField
(?) annotation.Must handle at least
List<Float>
+float[]
(this will likely require adding default bridges toorg.hibernate.search.mapper.pojo.bridge.mapping.impl.BridgeResolver.Builder#addDefaults
; registering a default bridge forList<Float>
might be challenging, feel free to ping Yoann about that)Must accept a (static) parameter to specify the number of dimensions.
Must ignore container extractors by default (we need to pass the full
List
to the backend, not a sequence of floats).Ability to include a k-nearest-neighbors (KNN) predicate in search queries, to find the “k” documents with a given field holding the most similar vectors when compared to a given vector.
May accept a parameter to specify the similarity function (Lucene apparently offers several?)
We can accept the following limitations:
The feature is limited to the Lucene backend for a start.
The feature is incubating/experimental for a start.
We don’t provide any integration with AI libraries: we only deal with vectors and it’s up to users to transform text/images/sound into vectors.
During indexing, vectors must be single-valued.
During indexing, vectors cannot be in a nested document.
KNN search is incompatible with paging.
If we must (but preferably let’s avoid this limitation): KNN search is incompatible with other predicates.