The resulting mapping and behavior for an entity containing a @Lob String field is severely wrong with the latest stable Hibernate and PostgreSQL JDBC driver versions.
(Note: I am aware of the various ways to circumvent this using a custom dialect or column definition, but I think the default behavior needs fixing).
I have a JPA entity that contains a String field annotated as @Lob, because I need to store text in it, ideally in a db-agnostic way, for which I don't have (much) control over the length.
With Hibernate 5.4.21, pgjdbc 42.2.16 and Postgres 12.4, this gets turned into a text column., which I would have expected. However, that column isn't filled with the actual text content as I would have expected, but with an oid pointing at the actual content as LO data - i.e. Hibernate and the JDBC driver are trying to handle the LO storage themselves.
There are a number of issues with the current mapping:
Putting an oid into a text column is not terribly efficient. When using oid/LO mapping, the column type should be oid instead
The text mapping means that PostgreSQL's vacuumlo and other cleanup processes won't recognize the oid reference and will collect all referenced LO data
On the other hand, we can't live without vacuumlo and friends, since no triggers are generated to delete LO data together with the referencing entity table rows, so the db accumulates a lot of garbage
I don't think this is the "least surprise" mapping that most users would expect
When looking at db data directly, working with oid/lo data is rather uncomfortable compared to text columns
TOAST is usually much better at handling long text than doing it "manually" through JDBC's CLOB interface (see below)
It is incredibly slow compared to using text (factor > 200 with a naive db setup, see below)
I know that the mapping is as it is because @Lob String maps to JDBC CLOB by default, which results in the LO behavior. However, I would argue that this is a rather technical view from "under the hood" rather than from the user's perspective (point 4). CLOB certainly makes sense for very large data if you want to stream it instead of reading it completely into memory. However, that's clearly not the case here, since we use a String field and already get the whole data at once.
And even if the LO mapping should remain, points 1-3 should probably still be fixed with adaptations to the emitted DDL.
Steps to reproduce
Check out example code from https://github.com/creckord/psql-lob-test
Start a local PostgreSQL database:
Switch into the project root and start it (you can configure your db connection in src/main/resources/application.yml):
Watch it do its thing
After it's done, check out the schema and some sample data (example results below):
Stop/reset the database, change the DB dialect in src/main/resources/application.yml and do it all over.
Resulting schema: (same in both cases)
Test results with current oid/lo mapping
Test results with dialect using actual text mapping
PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit