Won't Do
Details
Assignee
Marko BekhtaMarko BekhtaReporter
Yoann RodièreYoann RodièreComponents
Priority
Major
Details
Details
Assignee
Marko Bekhta
Marko BekhtaReporter
Yoann Rodière
Yoann RodièreComponents
Priority
Created November 23, 2023 at 9:50 AM
Updated December 3, 2024 at 9:23 AM
Resolved March 7, 2024 at 7:53 AM
VectorSimilarity.INNER_PRODUCT
requires that vectors be normalized in order to produce satisfying results.We currently (as of ) warn about this in the documentation, but wouldn’t it be better to perform the normalization automatically?
We’d expose
@Vector(normalization = ...)
(type float?), defaulting to -1 (meaning “default for the chosen similarity function”)For
VectorSimilarity.INNER_PRODUCT
, we’d normalize to 1.0f by default for float vectors, and to some otherFor other similarities, normalization would be disabled by default.
Users could set
@Vector(normalization = 0.0f)
to disable normalization (if they already normalize their data), or@Vector(normalization = 232f)
to set an arbitrary norm.Setting explicitly a norm that is not
0.0f
or1.0f
withVectorSimilarity.INNER_PRODUCT
and float vectors would result in an exception on startup: if people want to do something risky, they must disable our normalization and do it themselves.Alternatively, we could only expose
@Vector(normalize = VectorNormalize.YES/NO/DEFAULT)
and pick a hardcoded norm for both types: 1.0f for floats, and maybe the maximum possible norm for bytes? I’m not sure how useful it is to set an arbitrary norm.