Solution for fields derived from large blobs
Description
relates to
required for
Activity

Yoann RodièreMarch 6, 2024 at 4:08 PM
Relatedly, the processing of large blobs itself can take time, i.e. turning text into a vector through an AI model. We might want to approach this not as a field type, but as a “backend-level conversion” instead; critically, one that could be, depending on the use case:
batched to minimize overall latency – for remote data retrieval.
delayed until the last moment (e.g. some backend queue) to minimize memory usage – for large blob retrieval in general (local filesystem or remote URL, it doesn’t matter.
We somewhat discussed this here: https://hibernate.zulipchat.com/#narrow/stream/132092-hibernate-search-dev/topic/Batching.20value.20bridges . But I’m not sure this should be something we mix with bridges after all… A new backend-level component would probably make more sense and allow us to also address large blobs in general. The concept in itself is very similar to value bridges though; just more focused and with batching support.
I’m just thinking out loud, but we could imagine binders registering a “batch process” (name to be changed: extractor, loader, processor?) which has access to fields, bridges just passing values to that batch process, and backends executing those batch processes later and/or asynchronously to “amend” a document.
Details
Assignee
UnassignedUnassignedReporter
Yoann RodièreYoann RodièrePriority
Major
Details
Details
Assignee
Reporter

The content of some indexed fields may be derived from large blobs.
In such cases, it could be convenient, as well as provide the opportunity for optimizations, to have dedicated solutions, i.e. that works on an
InputStream
/URL/…See also ,