Discussion about this post

User's avatar
Ross Bradbury's avatar

I think SOME amount of rewrite by the optimizer is required. We especially found that to be true with exactly this situation in the post. It seems to me that "x IN(list of scalars)" and "x = ANY(list of scalers)" and x IN VALUES” is all the same thing until you use query parameters. I agree it doesn't need to get carried away (as it sounds like Oracle did). Everyone with the same issue benefits from optimizer improvements instead of having to change their applications as long as the planning time is capped to some time limit.

When I was on a team working with SQL Server queries in an application, I often advised people that when using any of the relational databases, try to keep these ideas in mind:

1. Write the query as WHAT you want to retrieve, not HOW to execute it. If you need to explain how to execute it whether by writing the query complicated or adding SQL server query hints then something is too complicated.

2. Second, try to keep the query as simple as possible (which does require some understanding of how the execution would happen). Like don't join in the same table again if it is joining on the same key! Maybe prefer to write as a join instead of a subquery etc.

3. EXPECT complicated queries to change execution plan as your data changes.

Expand full comment
James Coleman's avatar

Until PG14 using IN VALUES with a long list (and the resulting semi join) was significantly more performant than an = ANY expression with a large list because that expression was always executed with a linear scan through the list. That patch (to execute such an expression as a hash lookup) came out of my experience with such an = ANY containing 17k values being horrifically slow (but rewriting as an explicit join to VALUES being much faster).

Expand full comment
4 more comments...

No posts