Query translator generates an incorrect query with SELECT distinct + Order by + Implicit join

Description

This issue seems identical to (maybe a regression?)

The following query :
select distinct prod.supplier from products prod order by prod.supplier.supplierName

for the attached database, generates an incorrect SQL query with an extra join:
[Oracle]
select
distinct supplier1_.SUPPLIER_ID as SUPPLIER1_1_,
supplier1_.SUPPLIER_NAME as SUPPLIER2_1_,
supplier1_.CONTACT_NAME as CONTACT3_1_
from
PRODUCTS products0_
inner join
SUPPLIER supplier1_
on products0_.SUPPLIER_ID=supplier1_.SUPPLIER_ID,
SUPPLIER supplier2_
where
products0_.SUPPLIER_ID=supplier2_.SUPPLIER_ID
order by
supplier2_.SUPPLIER_NAME

[H2]:

SELECT DISTINCT supplier1_.SUPPLIER_ID AS SUPPLIER1_1_,
supplier1_.SUPPLIER_NAME AS SUPPLIER2_1_,
supplier1_.CONTACT_NAME AS CONTACT3_1_
FROM PUBLIC.PRODUCTS products0_
INNER JOIN PUBLIC.SUPPLIER supplier1_
ON products0_.SUPPLIER_ID=supplier1_.SUPPLIER_ID
CROSS JOIN PUBLIC.SUPPLIER supplier2_
WHERE products0_.SUPPLIER_ID=supplier2_.SUPPLIER_ID
ORDER BY supplier2_.SUPPLIER_NAME

I attach a test case with an auto-generated h2 database (it's a runnable netbeans project ) , but this is the output using an Oracle database:

INFO: HHH000046: Connection properties: {user=SBUREAU_D, password=****}
04-jun-2013 19:00:24 org.hibernate.dialect.Dialect <init>
INFO: HHH000400: Using dialect: org.hibernate.dialect.Oracle10gDialect
04-jun-2013 19:00:24 org.hibernate.engine.transaction.internal.TransactionFactoryInitiator initiateService
INFO: HHH000399: Using default transaction strategy (direct JDBC transactions)
04-jun-2013 19:00:24 org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory <init>
INFO: HHH000397: Using ASTQueryTranslatorFactory
Hibernate:
select
distinct supplier1_.SUPPLIER_ID as SUPPLIER1_1_,
supplier1_.SUPPLIER_NAME as SUPPLIER2_1_,
supplier1_.CONTACT_NAME as CONTACT3_1_
from
PRODUCTS products0_
inner join
SUPPLIER supplier1_
on products0_.SUPPLIER_ID=supplier1_.SUPPLIER_ID,
SUPPLIER supplier2_
where
products0_.SUPPLIER_ID=supplier2_.SUPPLIER_ID
order by
supplier2_.SUPPLIER_NAME
04-jun-2013 19:00:24 org.hibernate.engine.jdbc.spi.SqlExceptionHelper logExceptions
WARN: SQL Error: 1791, SQLState: 42000
04-jun-2013 19:00:24 org.hibernate.engine.jdbc.spi.SqlExceptionHelper logExceptions
ERROR: ORA-01791: no es una expresión obtenida bajo SELECT

Environment

None

Activity

Show:
Brett Meyer
March 11, 2014, 6:27 PM

The main issue is that the "prod.supplier" path is being used twice in the query, but order-by handling is incorrectly generating a second join (rather than using the first). So, this should look like:

Brett Meyer
March 11, 2014, 8:59 PM

A simple case like this seems like it would be easy to solve, but it won't really be possible until we introduce the new query parser sometime in ORM 5. As mentioned above, the main issue is that "order by" is not currently able to reuse the already joined supplier1_, even though their paths are identical.

The difficulty stems from the current inability to safely know which levels of a path have already been joined and can be reused. It's currently more of an all or nothing shot. The new query parser has a better concept of separated "path parts" and smartly decides to create a new join when it's actually needed.

The bottom line is that this would be extremely difficult or impossible to do in ORM 4. A hacky workaround might accomplish this specific test case, but safely handling all other complex circumstances would be an issue.

Brett Meyer
March 11, 2014, 9:02 PM

As a note to myself, pasting long pieces of a convo with :

(03:27:56 PM) sebersole: some.joined.path.separated.by.dots
(03:28:01 PM) sebersole: thats a implcit join
(03:28:17 PM) sebersole: as opposed to the explicit form
(03:28:31 PM) sebersole: from ... join ... join ...
(03:29:05 PM) sebersole: this is where the parser is not very smart
(03:29:13 PM) sebersole: the new parser is much better at this
(03:29:40 PM) sebersole: really the rule is that it is ok to reuse non-leaf parts of implicit join paths always
(03:30:04 PM) sebersole: for leaf parts it depends on a few things
(03:30:20 PM) sebersole: the old parser does not understand all that
(03:30:59 PM) sebersole: it does not understand that it has seen part of prod.supplier.supplierName before
(03:31:57 PM) sebersole: in general it just elects to be safe when it does not know
(04:09:11 PM) sebersole: select prod.supplier from Products prod order by prod.supplier.supplierName
(04:09:17 PM) sebersole: so thats your query
(04:09:37 PM) sebersole: but imagine...
(04:11:01 PM) sebersole: well first let me back up...
(04:11:17 PM) sebersole: the concern is filtering results that should not be filtered
(04:12:41 PM) sebersole: its hard to make up these models on the fly to explain by way of an example
(04:13:26 PM) sebersole: product -> supplier -> industry lets say
(04:15:06 PM) sebersole: so first look at:
(04:16:25 PM) sebersole: ... from Product p where p.supplier.industry.code = 'R&D' or p.supplier.hq.city = 'Austin'
(04:17:23 PM) sebersole: so the issue is that inner joins have a semantic of removing missing correlations
(04:18:03 PM) sebersole: here, if a product's supplier does not have a hq-address or does not have an industry... should they still be returned?
(04:18:31 PM) sebersole: say for example, a product from the supplier Acme that has no industry, but is in Austin
(04:18:39 PM) sebersole: should they get returned?
(04:19:36 PM) sebersole: as far as the alias part...
(04:19:42 PM) sebersole: there is another forms here..
(04:20:48 PM) sebersole: from Product p join p.supplier.industry i join p.supplier.hq h where i.code = 'R&D' or h.city = 'Austin'
(04:21:17 PM) sebersole: but really where the alias part in my above commments come into play is in cases like:
(04:22:32 PM) sebersole: select p.supplier.supplierName from Product p join p.supplier.industry i1 join p.supplier.industry i2 ...
(04:23:19 PM) sebersole: its complicated, but the idea is that the 'p.supplier' join is probably safe to reuse
(04:23:42 PM) sebersole: any dereference to 'p.supplier.industry; is not
(04:23:51 PM) sebersole: any dereference to 'p.supplier.industry' is likely not
(04:24:11 PM) sebersole: and even if it were safe... which woould you use?
(04:24:27 PM) sebersole: so "safe" is releative to the path
(04:24:44 PM) sebersole: aka, these aliases implicit joins are named
(04:24:52 PM) sebersole: wow
(04:24:56 PM) sebersole: aka, these aliased implicit joins are named
(04:25:18 PM) sebersole: and so should not be reused
(04:25:22 PM) sebersole: thats the safest
(04:25:59 PM) sebersole: actually the new parser looks at each "path part" separate and decides
(04:26:21 PM) sebersole: starting a new Join branch whenever it deems it needs to
(04:26:27 PM) sebersole: so part could be reused
(04:26:38 PM) sebersole: i think the current parser is not that smart
(04:26:43 PM) sebersole: i think its all/none to it
(04:27:12 PM) sebersole: aka, either the full p.supplier.industry path
(04:27:28 PM) sebersole: aka, either the full p.supplier.industry path can be reused, or none of it (no part of it) can

Gail Badner
July 30, 2014, 12:16 AM
Edited

This is fixed by HHH-9305, so resolving this as a duplicate.

Brett Meyer
April 22, 2015, 9:14 PM

Closing rejected issues.

Assignee

Gail Badner

Reporter

David Latorre

Fix versions

None

Labels

None

backPortable

None

Suitable for new contributors

None

Requires Release Note

None

Pull Request

None

backportDecision

None

Components

Affects versions

Priority

Major
Configure