4 Comments

Hello Andrey.

I would like to add Pull-up subqueries - indeed, this is a very necessary thing, but in my ORM I also encounter requests of the following type:

EXPLAIN (ANALYZE, TIMING OFF, COSTS ON)

SELECT * FROM tenk1 A

WHERE exists (SELECT TRUE

from (SELECT TRUE AS f1) f1_

INNER JOIN tenk2 B

ON B.hundred = A.hundred AND B.unique1 = A.odd

INNER JOIN onek C

ON B.unique1 = C.odd

WHERE B.two = 0 and C.two = 0);

In this case we get a subplan. Although if the query is slightly modified, it will be a SEMI JOIN:

EXPLAIN (ANALYZE, TIMING OFF, COSTS ON)

SELECT * FROM tenk1 A

WHERE exists (SELECT TRUE

FROM tenk2 B

INNER JOIN onek C

ON B.unique1 = C.odd

WHERE B.two = 0 and C.two = 0 and B.hundred = A.hundred AND B.unique1 = A.odd);

The essence of the queries is the same, but the optimizer selects different plans and, as in your case, the execution time with a subplan will be significantly longer. Why does the optimizer choose a subplan?

Expand full comment

Thanks for feedback!

You have identified a corner case that is still not implemented. Here's the situation: PostgreSQL attempts to flatten a subplan in two ways: through the `make_subplan` function and the `convert_EXISTS_sublink_to_join` function. Your example seems suitable for transformation in both cases. However, PostgreSQL currently forbids references to upper-level queries within the `ON` section, as this functionality has not yet been implemented. This limitation is even noted in the code:

/*

* Separate out the WHERE clause. (We could theoretically also remove

* top-level plain JOIN/ON clauses, but it's probably not worth the

* trouble.)

*/

Expand full comment

It's not clear to me what transformations to pull-up correlated subqueries you have in mind. Perhaps the best way to get that happen is to post some proposal on the hackers mailing list, with a description of the benefits and how the planner might decide to apply the transformation. I'd expect that to be the challenging part - costing the alternative transformations.

That was mostly the problem with the GROUP BY reordering - the statistics we have are not sufficient to make reliable decisions, and it got the patch reverted. The incremental sort has somewhat similar challenges, FWIW.

Expand full comment

In my experience, subqueries frequently appear in two places: WHERE clause (x IN smth) and target lists.

We have already discussed [1] one such transformation; the most apparent part of this work was committed [2] in the core. But you're right: the key question is cost estimation: without an index on the column involved in the expression, it sometimes makes little sense.

That's why I believe committing this to the core is not feasible for now. However, using a hook in an extension could alleviate the situation for some use cases. The fact that other DBMSes implement such transformations is a promising sign that the issue can be resolved, providing reassurance for the future of query optimization.

[1] https://www.postgresql.org/message-id/flat/CALNJ-vTa5VgvV1NPRHnypdnbx-fhDu7vWp73EkMUbZRpNHTYQQ%40mail.gmail.com

[2] https://www.postgresql.org/message-id/flat/CAKU4AWoZksNZ4VR-fLTdwmiR91WU8qViDBNQKNwY%3D7iyo%2BuV0w%40mail.gmail.com

Expand full comment