Automatically normalize indexed vectors where necessary

Description

VectorSimilarity.INNER_PRODUCT requires that vectors be normalized in order to produce satisfying results.

We currently (as of ) warn about this in the documentation, but wouldn’t it be better to perform the normalization automatically?

  • We’d expose @Vector(normalization = ...) (type float?), defaulting to -1 (meaning “default for the chosen similarity function”)

  • For VectorSimilarity.INNER_PRODUCT, we’d normalize to 1.0f by default for float vectors, and to some other

  • For other similarities, normalization would be disabled by default.

  • Users could set @Vector(normalization = 0.0f) to disable normalization (if they already normalize their data), or @Vector(normalization = 232f) to set an arbitrary norm.

  • Setting explicitly a norm that is not 0.0f or 1.0f with VectorSimilarity.INNER_PRODUCT and float vectors would result in an exception on startup: if people want to do something risky, they must disable our normalization and do it themselves.

Alternatively, we could only expose @Vector(normalize = VectorNormalize.YES/NO/DEFAULT) and pick a hardcoded norm for both types: 1.0f for floats, and maybe the maximum possible norm for bytes? I’m not sure how useful it is to set an arbitrary norm.

Activity

Show:

Marko Bekhta March 7, 2024 at 7:52 AM

We’ll close this improvement for now; see:

Won't Do

Details

Assignee

Reporter

Components

Priority

Created November 23, 2023 at 9:50 AM
Updated December 3, 2024 at 9:23 AM
Resolved March 7, 2024 at 7:53 AM