Use of @NotFound(IGNORE) and @BatchSize when there are unresolved foreign key values results in extra queries

Description

When accessing a unidirectional OneToOne association where there are unresolvable foreign key references when batching is enabled with @NotFound(IGNORE), each association access in a loop re-queries for all the unresolved foreign keys:

In the log capture above, the second query is redundant and unexpected.

Environment

None

Activity

Show:
Gail Badner
September 26, 2017, 10:17 PM

The main problem is that entity keys are not removed from BatchFetchQueue#batchLoadableEntityKeys when entities queried in the batch are not found. As a result, the same entities will likely be included in future batches (as reported above).

I've create a pull request [1] that contains a test case (BatchFetchNotFoundIgnoreDefaultStyleTest#testMostNotFoundFromQuery) with comments thatt spell out what is happening. In this test case, an entity class has an optional one-to-one association that is annotated with @NotFound(action = NotFoundAction.IGNORE).

There are 8 entities persisted with a non-null one-to-one association. The associations for the first 7 are deleted. When the entities are queried ("from Employee e order by e.id"), we expect the associated entities for the first 7 to not be found; only the association for the 7th entity will be found.

As the query results are processed, the entities keys for entities with the following IDs are added to BatchFetchQueue#batchLoadableEntityKeys: [0,1,2,3,4,5,6,7]

Batch query:
select batchfetch0_.id as id1_1_0_
from Task batchfetch0_
where batchfetch0_.id in (?, ?, ?, ?, ?)

When processing associated entity with id == 0:

  • batch query is executed with IDs bound to query: [0,1,2,3,4};

  • none of the entities are found;

  • associated entity with id == 0 is set to null.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [0,1,2,3,4,5,6,7].

When processing associated entity with id == 1:

  • batch query is executed with IDs bound to query: [1,5,2,3,4];

  • none of the entities are found;

  • associated entity with id == 1 is set to null.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [0,1,2,3,4,5,6,7].

When processing associated entity with id == 2:

  • batch query is executed with IDs bound to query: [2,5,6,3,4};

  • none of the entities are found;

  • associated entity with id == 2 is set to null.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [0,1,2,3,4,5,6,7].

When processing associated entity with id == 3:

  • batch query is executed with IDs bound to query: [3,5,6,7,4];

  • associated entity with id == 7 was found, put in the PersistenceContext; none of the others are found;

  • associated entity with id == 3 is set to null.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [0,1,2,3,4,5,6]. The EntityKey for id == 7 was removed from {BatchFetchQueue#batchLoadableEntityKeys}} when that entity was added to the PersistenceContext.

When processing associated entity with id == 4:

  • batch query is executed with IDs bound to query: [4,5,6,2,3];

  • none of the entities are found;

  • associated entity with id == 4 is set to null.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [0,1,2,3,4,5,6].

When processing associated entity with id == 5:

  • batch query is executed with IDs bound to query: [5,4,6,2,3];

  • none of the entities are found;

  • associated entity with id == 5 is set to null.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [0,1,2,3,4,5,6].

When processing associated entity with id == 6:

  • batch query is executed with IDs bound to query: [6,4,5,2,3];

  • none of the entities are found;

  • associated entity with id == 6 is set to null.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [0,1,2,3,4,5,6].

When processing associated entity with id == 7:

  • the associated entity with id == 7 is found in the PersistenceContext, so no extra query executed.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [0,1,2,3,4,5,6], and will continue to contain these until the session is flushed. If any other Task entities are loaded before flushing, the batch queries will continue to include 5 parameters.

[1] https://github.com/hibernate/hibernate-orm/pull/2014

Gail Badner
September 26, 2017, 11:10 PM
Edited

In the same PR, I've added a fix that removes the EntityKey for the entities that are not found by the batch query. Here is what happens now.

As the query results are processed, the entities keys for entities with the following IDs are added to BatchFetchQueue#batchLoadableEntityKeys: [0,1,2,3,4,5,6,7]

When processing associated entity with id == 0:

  • batch query is select ... from Task batchfetch0_ where batchfetch0_.id in (?, ?, ?, ?, ?)

  • batch query is executed with IDs bound to query: [0,1,2,3,4};

  • none of the entities are found;

  • associated entity with id == 0 is set to null.

BatchFetchQueue#batchLoadableEntityKeys still contains EntityKey for entities with IDs [5,6,7].

When processing associated entity with id == 1:

  • batch query (with 4 parameters) is select ... from Task batchfetch0_ where batchfetch0_.id in (?, ?, ?, ?)

  • batch query is executed with IDs bound to query: [1,5,6,7];

  • associated entity with id == 7 was found, put in the PersistenceContext; none of the others are found;

  • associated entity with id == 1 is set to null.

BatchFetchQueue#batchLoadableEntityKeys is empty.

A simple load query is used when processing the rest of the entities.
select ... from Task batchfetch0_ where batchfetch0_.id=?

When processing associated entity with id == 2:

  • load query for Task with id == 2 is executed;

  • associated entity with id == 2 is set to null.

When processing associated entity with id == 3:

  • load query for Task with id == 3 is executed;

  • associated entity with id == 3 is set to null.

When processing associated entity with id == 4:

  • load query for Task with id == 4 is executed;

  • associated entity with id == 4 is set to null.

When processing associated entity with id == 5:

  • load query for Task with id == 5 is executed;

  • associated entity with id == 5 is set to null.

When processing associated entity with id == 6:

  • load query for Task with id == 6 is executed;

  • associated entity with id == 6 is set to null.

When processing associated entity with id == 7:

  • the associated entity with id == 7 is found in the PersistenceContext, so no extra query executed.

This fix should perform much better. Once the batch is empty, it still requires doing an extra load query for those entities that were not found, which is less than ideal.

[1] https://github.com/hibernate/hibernate-orm/pull/2014

Gail Badner
October 14, 2017, 12:00 AM

The only way to improve on the pull request would be for Hibernate to track those entities that are not found when a batch is executed. Unfortunately, that would cause an inconsistency when using batch fetching versus not using it.

Here is an example using Employee#task, where:

  • the corresponding Employee column has task ID set to 1;

  • there is no Task entity with that ID.

A possible improvement would be to only track entities that were not found only while an operation is in progress. Unfortunately, that would be difficult to do until we have OperationContext functionality, but that won't be available until Hibernate 6.0 when is fixed.

Gail Badner
January 10, 2018, 6:01 AM

Fixed in 5.1 branch as well.

Assignee

Gail Badner

Reporter

Stephen Fikes

Fix versions

Labels

None

backPortable

None

Suitable for new contributors

None

Requires Release Note

None

Pull Request

None

backportDecision

None

Components

Affects versions

Priority

Major
Configure