I like the idea, but I think your model lacks consideration for buffering of objects in correlated subqueries (when for each tuple of the outer relation, we look up a value from the subquery) and all their dependent objects. I suspect there’s a significant underestimation of the cost of flushing buffers for objects related to temporary tables - indexes and TOASTs - and TOAST tables can be substantially larger than the main temporary tables. While we can get information about indexes more easily from the RelOptInfo structure, for TOASTs we can only find out if we actually access the relation, and we’re lucky if it happens to be in the cache (but rare I suppose). Otherwise, during cost calculation, we’d have to query the system catalog each time, which makes it expensive for only select-like queries.
In your current model, there’s a lack of knowledge about the actual (or almoust) number of temporary buffers for such dependent temporary relations, including their indexes and TOAST tables. At the moment, PostgreSQL doesn’t provide this information in the buffer manager or in any other statistics. Or am I missing something? By the way, have you run tests on large tables (with toast elements)?
By the way, wouldn’t it be safer to flush temporary pages during memory allocation for the query plan and its subplans in the InitPlan function? This would allow flushing both the main query plan and any subquery plans.
I think you should keep in mind the prototype designed for Postgres Professional a year or two ago. The purpose of that code was to use in test environments to reveal any extra corner cases we can't imagine in development.
But I follow the code, developed in collaboration with Tantor - they are a community-driven company, and I designed that code without fear of being invasive, following the general Postgres architecture. There is nothing in common between these two versions.
In short, code makes zero assumptions about the number of involved temporary tables - a single table in the subtree means we need to flush all the local buffers. It can cause fewer parallel plans involving temporary tables, but it is provably safe.
Hence, overestimations are possible, of course. But there is no room for underestimation.
I like the idea, but I think your model lacks consideration for buffering of objects in correlated subqueries (when for each tuple of the outer relation, we look up a value from the subquery) and all their dependent objects. I suspect there’s a significant underestimation of the cost of flushing buffers for objects related to temporary tables - indexes and TOASTs - and TOAST tables can be substantially larger than the main temporary tables. While we can get information about indexes more easily from the RelOptInfo structure, for TOASTs we can only find out if we actually access the relation, and we’re lucky if it happens to be in the cache (but rare I suppose). Otherwise, during cost calculation, we’d have to query the system catalog each time, which makes it expensive for only select-like queries.
In your current model, there’s a lack of knowledge about the actual (or almoust) number of temporary buffers for such dependent temporary relations, including their indexes and TOAST tables. At the moment, PostgreSQL doesn’t provide this information in the buffer manager or in any other statistics. Or am I missing something? By the way, have you run tests on large tables (with toast elements)?
By the way, wouldn’t it be safer to flush temporary pages during memory allocation for the query plan and its subplans in the InitPlan function? This would allow flushing both the main query plan and any subquery plans.
I think you should keep in mind the prototype designed for Postgres Professional a year or two ago. The purpose of that code was to use in test environments to reveal any extra corner cases we can't imagine in development.
But I follow the code, developed in collaboration with Tantor - they are a community-driven company, and I designed that code without fear of being invasive, following the general Postgres architecture. There is nothing in common between these two versions.
In short, code makes zero assumptions about the number of involved temporary tables - a single table in the subtree means we need to flush all the local buffers. It can cause fewer parallel plans involving temporary tables, but it is provably safe.
Hence, overestimations are possible, of course. But there is no room for underestimation.