Discussion about this post

User's avatar
Alena Rybakina's avatar

I like the idea, but I think your model lacks consideration for buffering of objects in correlated subqueries (when for each tuple of the outer relation, we look up a value from the subquery) and all their dependent objects. I suspect there’s a significant underestimation of the cost of flushing buffers for objects related to temporary tables - indexes and TOASTs - and TOAST tables can be substantially larger than the main temporary tables. While we can get information about indexes more easily from the RelOptInfo structure, for TOASTs we can only find out if we actually access the relation, and we’re lucky if it happens to be in the cache (but rare I suppose). Otherwise, during cost calculation, we’d have to query the system catalog each time, which makes it expensive for only select-like queries.

In your current model, there’s a lack of knowledge about the actual (or almoust) number of temporary buffers for such dependent temporary relations, including their indexes and TOAST tables. At the moment, PostgreSQL doesn’t provide this information in the buffer manager or in any other statistics. Or am I missing something? By the way, have you run tests on large tables (with toast elements)?

By the way, wouldn’t it be safer to flush temporary pages during memory allocation for the query plan and its subplans in the InitPlan function? This would allow flushing both the main query plan and any subquery plans.

1 more comment...

No posts

Ready for more?