Filter caching using causes excessive memory use
Description
Activity
Yoann Rodière September 29, 2022 at 10:05 AM
Unfortunately this ticket has been left alone for so long, much of the implementation of caching in Hibernate Search has been rewritten. I doubt it's still valid, so I'll close this ticket.
Sanne Grinovero April 30, 2010 at 7:16 PM
I changed it to "improvement" as it's caching as defined in configuration properties. I agree it would be nice to remove cached results as soon as they aren't needed, but it looks like quite a change, I'd suggest to move it to 3.3 unless there's going to be another CR: too many changes for a release imho.
Sanne Grinovero April 13, 2010 at 12:10 AM
I think the most of the needed improvements are already in the system but not yet implemented or released
feeling better
Dobes Vandermeer April 12, 2010 at 8:21 PM
Hi Sanne,
It's off-topic, yes. I think the most of the needed improvements are already in the system but not yet implemented or released. At the same time, our use of lucene is simple enough that it's easier to reimplement search using lucene directly than to patch hibernate-search. Hope that makes sense ... especially I am changing the system to store indexes in the database, and one set of indexes per tenant in our multi-tenant system. Each tenant has a relatively small set of records to search and searching is never done across tenants, so this reduces our memory footprint on a search considerably. Storing the indexes in the DB reduces IT complexity for backups, database clones, transactions, and unit tests and although performance may be slightly impacted, it seems worth the price in simplifying a bunch of other things.
Sanne Grinovero April 12, 2010 at 7:05 PM
It would certainly reduce the size of the memory leak and probably avoid a server crash
why do you say reduce? if you set it to zero no memory would be leaked; and I wouldn't call it a leak as this value is being respected by use request: filters aren't cached by default.
However, it would still be wasting a fair amount of memory.
right, I still think it's important to do - just not classifying it as a bug but as a nice improvement.
in my quest for a speedy and reliable setup I've gotten rid of the use of filters and, subsequently, hibernate search itself
While I almost understand your position about the filters, what did you find in the rest of Search to work against your desire for speed and reliability?
I have been in a similar position two years ago but I think I contributed all improvements I could spot, and I'm not finding evident areas which could have some exciting improvement. Sorry for OT, feel free to use the forum or mailing-list if you have good ideas.
The CachingWrapperFilter uses the reader instance (CacheableMultiReader) as a key for the caching.
However, the reader instance keeps pointers to byte arrays in its "normsCache" and in the "normsCache" of its sub-readers; each array has one byte for each document in the index and in some cases there will be multiple of these arrays associated with differet fields.
For an index with millions of records this can result in an apparent "leak" of hundreds of megabytes of memory as those readers are not re-used and the MRU cache used by default will keep up to 128 hard references to the readers by default.
The search system must either re-use or delete the normsCache, OR the cache key for these filters should be tied to something else that doesn't keep references to potentially huge data arrays. Otherwise the scalability of the search subsystem is significantly impacted when using filters, as you must have enough heap to accommodate up to 128 times as many copies of the norms arrays.