@Lob String mapping broken

Description

The resulting mapping and behavior for an entity containing a @Lob String field is severely wrong with the latest stable Hibernate and PostgreSQL JDBC driver versions.

(Note: I am aware of the various ways to circumvent this using a custom dialect or column definition, but I think the default behavior needs fixing).

Detailed description

I have a JPA entity that contains a String field annotated as @Lob, because I need to store text in it, ideally in a db-agnostic way, for which I don't have (much) control over the length.

With Hibernate 5.4.21, pgjdbc 42.2.16 and Postgres 12.4, this gets turned into a text column., which I would have expected. However, that column isn't filled with the actual text content as I would have expected, but with an oid pointing at the actual content as LO data - i.e. Hibernate and the JDBC driver are trying to handle the LO storage themselves.

There are a number of issues with the current mapping:

  1. Putting an oid into a text column is not terribly efficient. When using oid/LO mapping, the column type should be oid instead

  2. The text mapping means that PostgreSQL's vacuumlo and other cleanup processes won't recognize the oid reference and will collect all referenced LO data

  3. On the other hand, we can't live without vacuumlo and friends, since no triggers are generated to delete LO data together with the referencing entity table rows, so the db accumulates a lot of garbage

  4. I don't think this is the "least surprise" mapping that most users would expect

  5. When looking at db data directly, working with oid/lo data is rather uncomfortable compared to text columns

  6. TOAST is usually much better at handling long text than doing it "manually" through JDBC's CLOB interface (see below)

  7. It is incredibly slow compared to using text (factor > 200 with a naive db setup, see below)

I know that the mapping is as it is because @Lob String maps to JDBC CLOB by default, which results in the LO behavior. However, I would argue that this is a rather technical view from "under the hood" rather than from the user's perspective (point 4). CLOB certainly makes sense for very large data if you want to stream it instead of reading it completely into memory. However, that's clearly not the case here, since we use a String field and already get the whole data at once.

And even if the LO mapping should remain, points 1-3 should probably still be fixed with adaptations to the emitted DDL.

Steps to reproduce

  • Switch into the project root and start it (you can configure your db connection in src/main/resources/application.yml):

  • Watch it do its thing

  • After it's done, check out the schema and some sample data (example results below):

  • Stop/reset the database, change the DB dialect in src/main/resources/application.yml and do it all over.

Resulting schema: (same in both cases)

Test results with current oid/lo mapping

Test results with dialect using actual text mapping

Environment

org.postgresql:postgresql 42.2.16

PostgreSQL 12.4 (Debian 12.4-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit

Assignee

Unassigned

Reporter

Carsten Reckord

Fix versions

None

Labels

None

backPortable

None

Suitable for new contributors

None

Requires Release Note

None

Pull Request

None

backportDecision

None

Components

Affects versions

Priority

Major
Configure