Uploaded image for project: 'Hibernate ORM'
  1. HHH-12374

Order inserts sorting code gives up too soon

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2.16, 5.3.0.CR2
    • Component/s: hibernate-core
    • Labels:
      None
    • Bug Testcase Reminder (view):

      Bug reports should generally be accompanied by a test case!

    • Last commented by a user?:
      true
    • Sprint:

      Description

      Line 1186 of hibernate-orm\hibernate-core\src\main\java\org\hibernate\engine\spi\ActionQueue.java in the function sort(List<AbstractEntityInsertAction> insertions) is:

      long maxIterations = latestBatches.size() * 2;
      

      This variable controls how many passes the sort algorithm is willing to do before it gives up and assumes that the set of batches is unsortable due to a cycle of entity parent-relationships (lines 1214-1220):

      }
      while ( !sorted && iterations <= maxIterations);
      
      if ( iterations > maxIterations ) {
      	LOG.warn( "The batch containing " + latestBatches.size() + " statements could not be sorted after " + maxIterations + " iterations. " +
      					"This might indicate a circular entity relationship." );
      }
      

      The formula long maxIterations = latestBatches.size() * 2 assumes that the sorting algorithm in use has O(N) scaling. This is impossible – the optimum possible performance for any general search algorithm is O(N logN) on a fully-ordered set (which is a special case of a partially-ordered set). I am uncertain whether the search algorithm in use here has O(N logN) performance (like QuickSort, but it doesn't look like that, since it doesn't include any binary search behavior), or O(N^2), or O(N^2 logN), or even O(N^3) (like Bubble Sort, which the algorithm somewhat resembles), but it definitely cannot be O(N).

      I have a (complex and currently unready for submission) test case with latestBatches.size() = 10 which requires maxIterations >= 28 to be sorted correctly – significantly more than the limit of 20 that the code above sets. I am in the process of developing a fuzz test to search for even worse cases over the range latestBatches.size() = 8 to 16 – once I find some I will convert this into a test case or test cases and attach it here. Hopefully the ratio between the required maxIterations values for N=8 and N=16 will cast some light on what the performance scaling of the search algorithm in use is.

      (FYI, good algorithms for sorting a partially-ordered set can be found at https://en.wikipedia.org/wiki/Topological_sorting – apparently they are linear in the (number of nodes plus number of links) of the graph for the partially ordered set, which means they are potentially anywhere from O(N) up to O(N^2) in the number of nodes, depending on how dense the links are – for a fully-ordered set they would be O(N^2).)

      Possibly this should be a separate JIRA, but if the maxIterations limit is reached, then the code's current behavior is to log a warning and then attempt to use whatever not-fully sorted batch order its sort algorithm had currently reached. This will almost inevitably result in a constraint violation (whether the entity relationship graph does actually contain a cycle, or the list was simply not yet full sorted). I believe that the correct behavior would be to instead log a warning and then act as if ORDER_INSERTS had not been set, and perform the inserts (not batched) in their original unsorted order (i.e. in the order they were persisted in). This will make the database operation slower, but is the order in which it is most likely to succeed without a constraint violation, and (unlike the current behavior) also gives the user of the library full control to fix the problem if a constraint violation occurs.

      An even better behavior (consider this an enhancement suggestion) would be to use something like https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm to identify which entity types can and cannot safely be batched, then batch the ones that can be batched and sort the batches correctly, while performing the inserts for entity types that cannot safely be batched in their original order (interspersed in the appropriate places in the order of batches).

        Attachments

          Issue links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: