"insert from select" With hibernate.default_schema Configured Doesn't Work With Hibernate 6
Description
Activity
Christian Beikov September 20, 2022 at 7:18 AM
Hey Kai, thanks for the report. I created for this new issue.
Kai Zander September 19, 2022 at 5:26 PM
I experimented a bit more and found the following:
If I rewrite the HQL to this (don’t select id
, use distinct
instead of group by
):
insert into TabData (col1, col2, col3)
select distinct col1, col2, col3
from TabDataStaging
Hibernate 6 does the following:
insert into HTE_tab_data(col1, col2, col3, rn_)
(select distinct p2_0.col1,
p2_0.col2,
p2_0.col3,
row_number() over () -- adding this column effectively disables the `distinct`
from tab_data_staging p2_0);
Followed by the updates
on the HTE table you were suspecting earlier:
update HTE_tab_data set id=? where rn_=?
update HTE_tab_data set id=? where rn_=?
...
Followed by
insert into tab_data(col1,col2,col3,id)
select p1_0.col1,p1_0.col2,p1_0.col3,p1_0.id
from HTE_tab_data p1_0
The final insert
into tab_data
can then fail due to unique constraint violation (I have a composite unique constraint on col1, col2, col3
), because even though I’m using distinct
, the row_number()
clause added by Hibernate effectively disabled it…
Christian Beikov September 19, 2022 at 11:29 AM
Ok, the problem is that you are providing an expression for the id attribute in the insert, so we wouldn’t have to use a temporary table acrobatics, but we currently fail to detect this early enough. I’ll look into optimizing this.
Kai Zander September 19, 2022 at 11:20 AM
Thanks for the reply! Here are the entity classes:
@Getter
@Setter
@MappedSuperclass
@EqualsAndHashCode(onlyExplicitlyIncluded = true)
public abstract sealed class BaseTabData permits TabData, TabDataStaging {
@Id
@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "tab_data_seq")
@SequenceGenerator(name = "tab_data_seq", sequenceName = "tab_data_seq", allocationSize = 500)
private Long id;
@EqualsAndHashCode.Include
private String col1;
@EqualsAndHashCode.Include
private String col2;
@EqualsAndHashCode.Include
private String col3;
}
@Entity
@NamedQuery(name = "TabData.copyFromStaging",
query = """
insert into TabData (
id,
col1,
col2,
col3
)
select max(id),
col1,
col2,
col3
from TabDataStaging
group by col1,
col2,
col3""")
public final class TabData extends BaseTabData {}
@Entity
public final class TabDataStaging extends BaseTabData {}
I have an integration test running against H2 that does the following:
Insert 500 rows into
tab_data_staging
Clear
tab_data
Copy all rows from
tab_data_staging
intotab_data
(using theTabData.copyFromStaging
named query).
Here’s the log output running with Hibernate 5:
2022-09-16 09:18:48.063Z INFO [Pool-1-worker-1] o.s.t.c.transaction.TransactionContext : Began transaction (1) for test context
2022-09-16 09:18:48.071Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : call next value for tab_data_seq
2022-09-16 09:18:48.072Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : call next value for tab_data_seq
2022-09-16 09:18:48.081Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : insert into tab_data_staging (col2, col1, col3, id) values (?, ?, ?, ?)
... 498 identical insert statements
2022-09-16 09:18:48.096Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : insert into tab_data_staging (col2, col1, col3, id) values (?, ?, ?, ?)
2022-09-16 09:18:48.102Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : delete from tab_data
2022-09-16 09:18:48.103Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : insert into tab_data ( id, col1, col2, col3 ) select max(tabdatasta0_.id) as col_0_0_, tabdatasta0_.col1 as col_1_0_, tabdatasta0_.col2 as col_2_0_, tabdatasta0_.col3 as col_3_0_ from tab_data_staging tabdatasta0_ group by tabdatasta0_.col1 , tabdatasta0_.col2 , tabdatasta0_.col3
2022-09-16 09:18:48.110Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : select tabdatasta0_.id as id1_20_, tabdatasta0_.col2 as product_2_20_, tabdatasta0_.col1 as product_3_20_, tabdatasta0_.col3 as col34_20_ from tab_data tabdatasta0_
2022-09-16 09:18:48.120Z INFO [Pool-1-worker-1] o.s.t.c.transaction.TransactionContext : Rolled back transaction for test
Here’s the log output running with Hibernate 6 (notice there are no update
statements against the HTE table):
2022-09-14T17:50:35.275Z INFO [Pool-1-worker-1] o.s.t.c.transaction.TransactionContext : Began transaction (1) for test context
2022-09-14T17:50:35.282Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : select next value for tab_data_seq
2022-09-14T17:50:35.282Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : select next value for tab_data_seq
2022-09-14T17:50:35.287Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : insert into tab_data_staging (col2, col1, col3, id) values (?, ?, ?, ?)
... 498 identical insert statements
2022-09-14T17:50:35.301Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : insert into tab_data_staging (col2, col1, col3, id) values (?, ?, ?, ?)
2022-09-14T17:50:35.305Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : delete from tab_data
2022-09-14T17:50:35.305Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : create local temporary table HTE_tab_data(id bigint, col2 varchar(255), col1 varchar(255), col3 varchar(255), rn_ integer not null, primary key (rn_))
2022-09-14T17:50:35.307Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : insert into HTE_tab_data(id,col1,col2,col3,rn_) (select max(p2_0.id),p2_0.col1,p2_0.col2,p2_0.col3,row_number() over() from tab_data_staging p2_0 group by p2_0.col1,p2_0.col2,p2_0.col3)
2022-09-14T17:50:35.313Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : insert into tab_data(id,col1,col2,col3) select p1_0.id,p1_0.col1,p1_0.col2,p1_0.col3 from HTE_tab_data p1_0
2022-09-14T17:50:35.316Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : delete from HTE_tab_data
2022-09-14T17:50:35.319Z DEBUG [Pool-1-worker-1] org.hibernate.SQL : select p1_0.id,p1_0.col2,p1_0.col1,p1_0.col3 from tab_data p1_0
2022-09-14T17:50:35.327Z INFO [Pool-1-worker-1] o.s.t.c.transaction.TransactionContext : Rolled back transaction for test
I have hibernate.jdbc.batch_size=1000
configured, if that matters.
Christian Beikov September 19, 2022 at 10:44 AM
To answer your first question, I would need to see the entity model used for tab_data
. I suspect that the entity is using a sequence generator with an allocation size bigger than 1 though, as that will make some intermediate steps necessary, because the sequence gives a value e.g. 1, but we have to generate the next N-1 values manually, as the next sequence call will give use N+1.
Between the insert into the HTE table and the insert to the target table you should see some update statements happening against the HTE table, which assign proper id values based on the sequence generator.
I don’t know what Hibernate 5 did before, but I suspect you are using @GeneratedValue(AUTO)
which might have defaulted to using identity generation or SEQUENCE with an allocation size of 1 before. In Hibernate 6 the default for AUTO is SEQUENCE with an allocation size of 50 though.
I’ll look into the schema prefixing issue as part of this Jira issue. I also created as an improvement, which will allow to control whether global temporary tables should be created/dropped as part of hbm2ddl or as part of the running transaction.
I encountered the following issue (issues really) migrating our application from Hibernate 5 to 6.
Situation:
We have two tables with identical columns,
OWNER.TAB_DATA
andOWNER.TAB_DATA_STAGING
. These are in the schemaOWNER
.We connect to the DB as
CLIENT
and have configuredhibernate.default_schema=OWNER
. DML privileges to the tables in theOWNER
schema have been granted to theCLIENT
user.TAB_DATA_STAGING
is populated by a batch job in chunks over the course of multiple transactions. Then, within a new transaction, we move the data over toTAB_DATA
with the following HQL:insert into TabData(id, col1, col2, col3) select max(id), col1, col2, col3 from TabDataStaging group by col1, col2, col3; -- to eliminate possible duplicates in the staging table
Hibernate 5 rendered this HQL into a single DML statement as expected:
insert into OWNER.tab_data (id, col1, col2, col3) select max(tabdatasta0_.id), tabdatasta0_.col1, tabdatasta0_.col2, tabdatasta0_.col3 from OWNER.tab_data_staging group by tabdatasta0_.col1, tabdatasta0_.col2, tabdatasta0_.col3
Hibernate 6 however does the following steps:
Attempts to create a temporary table
HTE_OWNER.tab_data
. This fails without logging any errors, becauseHTE_OWNER
is not a known schema. Also, even if the schema was correct, it would still fail becauseCLIENT
is not allowed to create tables in theOWNER
schema.Issue our insert statement against the temporary table instead of
OWNER.tab_data
. This now fails because the temporary table doesn’t exist (oracle.jdbc.OracleDatabaseException: ORA-00942: table or view does not exist
). Testing this scenario against H2 where everything happens in the same schema and the temporary table could be created, reveals the next step that would have happened:Copy all rows from the temporary table into the actual target table,
OWNER.tab_data
.Question/Issue
Why is an
insert into ... select from
no longer performed directly? Why the need to copy into a temporary table first and then into the actual target table?When resolving the names of the temporary tables, Hibernate 6 doesn’t consider that a
hibernate.default_schema
may have been configured. I found the offending code here and here, thebasename
andname
variables can already be prefixed with the custom schema, which isn’t handled there. TheHT_
andHTE_
prefixes are just blindly prepended.