Subselect fetching with setMaxResults()

Description

Any limit that gets applied to a query should also get applied to the subselect of any subsequent collection fetches.

Also, we need to make sure that a newly inserted collection that shows up in the subselect fetch resultset gets ignored by Loader.

Activity

Show:

Ivan Stanic March 24, 2023 at 2:02 PM

Hi ,

Sorry for late reply.

I think that adding some config that would control this behavior is way to go, and by default it would not be applicable. I see that there are following options:

  1. Introduce 2 options, one to apply the limit to subquery, another to check for correspondence of data. With this setup one who uses SNAPSHOT or SERIALIZABLE transaction isolation levels can go with true and false, and the one who uses READ_COMMITED can go with true and true options.

 

SUBQUERY_APPLY_LIMIT

SUBQUERY_CHECK_CORRESPONDENCE

SNAPSHOT or SERIALIZABLE

true

false

READ_COMMITED

true

true

With this parametrization we would avoid additional correspondence check if not necessary.

If SUBQUERY_CHECK_CORRESPONDENCE is true, and some data is found that does not belong here than another query can be executed with no limit option.

2. Introduce only SUBQUERY_APPLY_LIMIT and always check for correspondence. This would introduce some not needed check in case of transactions levels SNAPSHOT or SERIALIZABLE

3. Introduce only SUBQUERY_APPLY_LIMIT and never check for correspondence by default. This should be followed with some documentation that this should be used only if transaction isolation levels are SNAPSHOT or SERIALIZABLE

4. Do not use any options, always apply limit and always check for correspondence.

I am in favor of the option 1 as the one with the most flexibility but other options are fine as well.

Regards,

Ivan

 

Christian Beikov August 24, 2022 at 11:02 AM

I’m not sure it’s a good idea to do this by default as this will only work safely if you use the SNAPSHOT or SERIALIZABLE transaction isolation level. With READ_COMMITTED, new rows might pop into existence between the first query and the subselect fetch (according to the order by clause) which will cause certain collections to be empty.

Imagine the simplest case from Order o order by o.id desc limit 1. When orderItems is SUBSELECT fetched, but between the first and the second query a new order is committed, the subselect fetch query will return results that are not related to the originally fetched orders. Now, we could detect this scenario by checking if the result contains keys that have no correspondence and then execute another query, without order by and limit, to make sure we load all the data, but not sure if that is so great.

Overall, I think we need to at least discuss the scenario before implementing this. Maybe introduce some config option, I know it sucks, which controls whether we want to ignore the possible READ_COMMITTED issue or not, and by default, handle it transparently in the way I described.

Frédéric Donckels October 12, 2021 at 9:36 AM

Christian, have a look at where I added a simple test case for this a long time ago (I can’t figure out which user credentials I used at that time, though, especially with the migration of the ticketing system)

Christian Beikov October 12, 2021 at 9:19 AM

I don’t think so, at least I don’t know of a test for this and haven’t explicitly worked in this. I know how we can “fix” it now though since we now have support for LIMIT/OFFSET in subqueries as well.

Steve Ebersole October 8, 2021 at 7:23 PM

Is this working on 6.0 ?

Details

Assignee

Reporter

Components

Fix versions

Priority

Created April 2, 2005 at 3:02 AM
Updated April 9, 2024 at 10:04 AM