Cannot put more than 1000 elements in a InExpression

Description

The number of elements that we can put in a "in" expression is limited to a certain amount (1000 for Oracle, for instance). When creating a criteria query, the org.hibernate.criterion.InExpression class should split the expression into several smaller ones.

Attached is a patch which splits the expression by slices of 500 elements. For example, if we have 1001 elements to put in the "in" expression, the result would be :
(entity.field in (?, ?, ?...) or entity.field in (?, ?, ?...) or entity.field in )

The surrounding parantheses are useful to avoid problems with other conditions (a "and" condition taking over the one of the "or" conditions).

Attachments

5

Activity

Show:

Vasile Bors January 7, 2016 at 9:18 AM

Oh, I can not upload sources

Vasile Bors January 7, 2016 at 9:13 AM

For HQL query I implemented a solution for creating a helper class for parsing and split the IN expressions, and I call it in AbstractQueryImpl.expandParameterList if the collection is more than 1000 items.

Steve Ebersole September 5, 2012 at 9:39 PM

Noel, these are better because they are concrete. Thanks.

1) Have you looked at subselect fetching? There are some limitations to it, but it is more in-line with what you are looking for there. So instead of @BatchSize you'd use @org.hibernate.annotations.Fetch(org.hibernate.annotations.FetchMode.SUBSELECT)

2) Not against this, but please note just how vastly different that is then this request. So propose an API. Or better yet, develop a proposal and send us a pull request.

As I said, I am not even really against this issue as a whole; its just not doable given the current code. And it might not ever be in terms of splitting. Ok, sure I can see use of temp tables possibly working there, but I would just point out how much trouble we have with our existing use of temp tables due to the extremely differing support for temp tables across databases. So in theory it sounds great, but in practice I expect difficulties.

Noel Trout September 5, 2012 at 9:04 PM

So, I doubt this is going to matter as the issue has now already been resolved. But since you asked Steve, I have two particular use cases.

1. We map our collections often with @BatchSize as we often we are loading a lot of parent entities and when we access the collection of one we are eventually going to need to access the collection of another. Today we mostly annotate these with @BatchSize(size=2000) to obey the SQL Server prepared statement limits as the collection load is eventually translated to a prepared statement of in (?,?,?...) by class JoinWalker. Now, with 3000 entities in session I am going to be performing multiple selects because of my constrained batch size. The simple reality is that if I could instead perform one select with a bigger IN clause, I would get better performance. This is not ridiculous; this is a fact in our environment and with our data sets. So, as we use numeric database IDs, I would like an API where I can execute my lazy collection load with a SQL string via a statement and not a prepared statement to get around this limit. I understand this is very specific to our case, so I do not expect Hibernate to solve this inherently but an API extension point would be nice.
2. We have some queries that we execute in native SQL as we use features / joins / etc. that Hibernate does not support. This is fine, but these are used for filtering to determine the appropriate data set to use and then we want to get the entities mapped with Hibernate to manipulate. Today, we do this by executing the native query in JDBC to collect ids, and then executing a second query via Hibernate to load the entities with an ID in (X,Y,Z) clause. We are not using native mapped queries as we are often fetching extended object graphs for performance and the native query mechanism is quite cumbersome for this as we are using Hibernate behind the JPA API as a provider. So, in this particular case, I would get a lot better performance if I could skip executing two queries and instead create a common table expression (CTE) / temporary table out of the first select and then use that in a Hibernate where clause. There seems to be a number of people looking for this type of temporary object support.

If I am missing a feature of Hibernate that meets these use cases and I am way off base with this particular bug, I would be happy to be enlightened. Thanks, Noel

Steve Ebersole September 5, 2012 at 5:58 PM

Sorry you think it is ridiculous, but even your database thinks so (or one of or a few of them).

As I said in my previous comment, we will be in a better position to address this when we actually have AST-based query handling thoughout. But until that time, it is simply not feasible. If you actually looked at the code (to create a pull request, or something) you'd see that.

Fixed

Details

Assignee

Reporter

Original estimate

Time tracking

0.88h logged1h remaining

Components

Fix versions

Priority

Created November 3, 2005 at 4:36 PM
Updated March 1, 2017 at 12:10 AM
Resolved September 5, 2012 at 6:46 PM