Details

Type: Bug

Status: Closed

Priority: Major

Resolution: Fixed

Affects Version/s: None

Component/s: hibernatecore

Labels:None

Bug Testcase Reminder (view):

Last commented by a user?:true
Description
Line 1186 of hibernateorm\hibernatecore\src\main\java\org\hibernate\engine\spi\ActionQueue.java in the function sort(List<AbstractEntityInsertAction> insertions) is:
long maxIterations = latestBatches.size() * 2;
This variable controls how many passes the sort algorithm is willing to do before it gives up and assumes that the set of batches is unsortable due to a cycle of entity parentrelationships (lines 12141220):
} while ( !sorted && iterations <= maxIterations); if ( iterations > maxIterations ) { LOG.warn( "The batch containing " + latestBatches.size() + " statements could not be sorted after " + maxIterations + " iterations. " + "This might indicate a circular entity relationship." ); }
The formula long maxIterations = latestBatches.size() * 2 assumes that the sorting algorithm in use has O(N) scaling. This is impossible – the optimum possible performance for any general search algorithm is O(N logN) on a fullyordered set (which is a special case of a partiallyordered set). I am uncertain whether the search algorithm in use here has O(N logN) performance (like QuickSort, but it doesn't look like that, since it doesn't include any binary search behavior), or O(N^2), or O(N^2 logN), or even O(N^3) (like Bubble Sort, which the algorithm somewhat resembles), but it definitely cannot be O(N).
I have a (complex and currently unready for submission) test case with latestBatches.size() = 10 which requires maxIterations >= 28 to be sorted correctly – significantly more than the limit of 20 that the code above sets. I am in the process of developing a fuzz test to search for even worse cases over the range latestBatches.size() = 8 to 16 – once I find some I will convert this into a test case or test cases and attach it here. Hopefully the ratio between the required maxIterations values for N=8 and N=16 will cast some light on what the performance scaling of the search algorithm in use is.
(FYI, good algorithms for sorting a partiallyordered set can be found at https://en.wikipedia.org/wiki/Topological_sorting – apparently they are linear in the (number of nodes plus number of links) of the graph for the partially ordered set, which means they are potentially anywhere from O(N) up to O(N^2) in the number of nodes, depending on how dense the links are – for a fullyordered set they would be O(N^2).)
Possibly this should be a separate JIRA, but if the maxIterations limit is reached, then the code's current behavior is to log a warning and then attempt to use whatever notfully sorted batch order its sort algorithm had currently reached. This will almost inevitably result in a constraint violation (whether the entity relationship graph does actually contain a cycle, or the list was simply not yet full sorted). I believe that the correct behavior would be to instead log a warning and then act as if ORDER_INSERTS had not been set, and perform the inserts (not batched) in their original unsorted order (i.e. in the order they were persisted in). This will make the database operation slower, but is the order in which it is most likely to succeed without a constraint violation, and (unlike the current behavior) also gives the user of the library full control to fix the problem if a constraint violation occurs.
An even better behavior (consider this an enhancement suggestion) would be to use something like https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm to identify which entity types can and cannot safely be batched, then batch the ones that can be batched and sort the batches correctly, while performing the inserts for entity types that cannot safely be batched in their original order (interspersed in the appropriate places in the order of batches).
Attachments
Issue links
 cloned from

HHH12405 Order inserts sorting code uses a very inefficient algorithm and can take over 200,000 iterations to sort insertion batches to just 32 tables
 Open
 relates to

HHH12105 Batch order_inserts: flush during transaction causes incorrect insert ordering and subsequent constraint violation
 Closed