I would like to add Pull-up subqueries - indeed, this is a very necessary thing, but in my ORM I also encounter requests of the following type:
EXPLAIN (ANALYZE, TIMING OFF, COSTS ON)
SELECT * FROM tenk1 A
WHERE exists (SELECT TRUE
from (SELECT TRUE AS f1) f1_
INNER JOIN tenk2 B
ON B.hundred = A.hundred AND B.unique1 = A.odd
INNER JOIN onek C
ON B.unique1 = C.odd
WHERE B.two = 0 and C.two = 0);
In this case we get a subplan. Although if the query is slightly modified, it will be a SEMI JOIN:
EXPLAIN (ANALYZE, TIMING OFF, COSTS ON)
SELECT * FROM tenk1 A
WHERE exists (SELECT TRUE
FROM tenk2 B
INNER JOIN onek C
ON B.unique1 = C.odd
WHERE B.two = 0 and C.two = 0 and B.hundred = A.hundred AND B.unique1 = A.odd);
The essence of the queries is the same, but the optimizer selects different plans and, as in your case, the execution time with a subplan will be significantly longer. Why does the optimizer choose a subplan?
You have identified a corner case that is still not implemented. Here's the situation: PostgreSQL attempts to flatten a subplan in two ways: through the `make_subplan` function and the `convert_EXISTS_sublink_to_join` function. Your example seems suitable for transformation in both cases. However, PostgreSQL currently forbids references to upper-level queries within the `ON` section, as this functionality has not yet been implemented. This limitation is even noted in the code:
/*
* Separate out the WHERE clause. (We could theoretically also remove
* top-level plain JOIN/ON clauses, but it's probably not worth the
It's not clear to me what transformations to pull-up correlated subqueries you have in mind. Perhaps the best way to get that happen is to post some proposal on the hackers mailing list, with a description of the benefits and how the planner might decide to apply the transformation. I'd expect that to be the challenging part - costing the alternative transformations.
That was mostly the problem with the GROUP BY reordering - the statistics we have are not sufficient to make reliable decisions, and it got the patch reverted. The incremental sort has somewhat similar challenges, FWIW.
In my experience, subqueries frequently appear in two places: WHERE clause (x IN smth) and target lists.
We have already discussed [1] one such transformation; the most apparent part of this work was committed [2] in the core. But you're right: the key question is cost estimation: without an index on the column involved in the expression, it sometimes makes little sense.
That's why I believe committing this to the core is not feasible for now. However, using a hook in an extension could alleviate the situation for some use cases. The fact that other DBMSes implement such transformations is a promising sign that the issue can be resolved, providing reassurance for the future of query optimization.
Hello Andrey.
I would like to add Pull-up subqueries - indeed, this is a very necessary thing, but in my ORM I also encounter requests of the following type:
EXPLAIN (ANALYZE, TIMING OFF, COSTS ON)
SELECT * FROM tenk1 A
WHERE exists (SELECT TRUE
from (SELECT TRUE AS f1) f1_
INNER JOIN tenk2 B
ON B.hundred = A.hundred AND B.unique1 = A.odd
INNER JOIN onek C
ON B.unique1 = C.odd
WHERE B.two = 0 and C.two = 0);
In this case we get a subplan. Although if the query is slightly modified, it will be a SEMI JOIN:
EXPLAIN (ANALYZE, TIMING OFF, COSTS ON)
SELECT * FROM tenk1 A
WHERE exists (SELECT TRUE
FROM tenk2 B
INNER JOIN onek C
ON B.unique1 = C.odd
WHERE B.two = 0 and C.two = 0 and B.hundred = A.hundred AND B.unique1 = A.odd);
The essence of the queries is the same, but the optimizer selects different plans and, as in your case, the execution time with a subplan will be significantly longer. Why does the optimizer choose a subplan?
Thanks for feedback!
You have identified a corner case that is still not implemented. Here's the situation: PostgreSQL attempts to flatten a subplan in two ways: through the `make_subplan` function and the `convert_EXISTS_sublink_to_join` function. Your example seems suitable for transformation in both cases. However, PostgreSQL currently forbids references to upper-level queries within the `ON` section, as this functionality has not yet been implemented. This limitation is even noted in the code:
/*
* Separate out the WHERE clause. (We could theoretically also remove
* top-level plain JOIN/ON clauses, but it's probably not worth the
* trouble.)
*/
It's not clear to me what transformations to pull-up correlated subqueries you have in mind. Perhaps the best way to get that happen is to post some proposal on the hackers mailing list, with a description of the benefits and how the planner might decide to apply the transformation. I'd expect that to be the challenging part - costing the alternative transformations.
That was mostly the problem with the GROUP BY reordering - the statistics we have are not sufficient to make reliable decisions, and it got the patch reverted. The incremental sort has somewhat similar challenges, FWIW.
In my experience, subqueries frequently appear in two places: WHERE clause (x IN smth) and target lists.
We have already discussed [1] one such transformation; the most apparent part of this work was committed [2] in the core. But you're right: the key question is cost estimation: without an index on the column involved in the expression, it sometimes makes little sense.
That's why I believe committing this to the core is not feasible for now. However, using a hook in an extension could alleviate the situation for some use cases. The fact that other DBMSes implement such transformations is a promising sign that the issue can be resolved, providing reassurance for the future of query optimization.
[1] https://www.postgresql.org/message-id/flat/CALNJ-vTa5VgvV1NPRHnypdnbx-fhDu7vWp73EkMUbZRpNHTYQQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CAKU4AWoZksNZ4VR-fLTdwmiR91WU8qViDBNQKNwY%3D7iyo%2BuV0w%40mail.gmail.com