In the previous post, I explored some nuances of Postgres related to indexes and parallel workers. This text sparked a lively discussion on LinkedIn, during which one commentator (thanks to Ants Aasma) proposed an index that was significantly more efficient than those discussed in the article. However, an automated comparison of EXPLAINs did not clarify the reasons for its superiority, necessitating further investigation.
This index:
CREATE INDEX ON order_events ((event_payload ->> 'terminal'::text),
event_type,event_created); -- (1)
At first (purely formal) glance, this index should not be much better than the alternatives:
CREATE INDEX ON order_events (event_created,
(event_payload ->> 'terminal'::text),
event_type); -- (2)
CREATE INDEX ON order_events (event_created, event_type); -- (3)
However, the observed speedup is significant; in fact, the performance of index (1) surpasses index (2) by more than 50 times and exceeds index (3) by almost 25 times!
The advantages of the proposed index are evident when we consider the logic of the subject area. It is more selective and is less likely to retrieve rows that do not match the filter. For instance, if we first identify all the rows that correspond to a specific airport, we can then focus on the boundaries of the date range. At this point, all retrieved rows will already meet the filter criteria. Conversely, if we begin by determining the date range, we may encounter numerous rows related to other terminals within that range.
However, when examining the EXPLAIN output, we do not see any distinctive reasons:
-- (1)
-> Index Scan using order_events_expr_event_type_event_created_idx
(cost=0.57..259038.66 rows=64540 width=72)
(actual time=0.095..232.855 rows=204053.00 loops=1)
Index Cond:
event_payload ->> 'terminal' = ANY ('{Berlin,Hamburg,Munich}' AND
event_type = ANY ('{Created,Departed,Delivered}') AND
event_created >= '2024-01-01 00:00:00+00' AND
event_created < '2024-02-01 00:00:00+00'
Index Searches: 9
Buffers: shared hit=204566
-- (2)
-> Index Scan using order_events_event_created_event_type_expr_idx
(cost=0.57..614892.22 rows=64540 width=72)
(actual time=0.499..14303.685 rows=204053.00 loops=1)
Index Cond:
event_created >= '2024-01-01 00:00:00+00' AND
event_created < '2024-02-01 00:00:00+00' AND
event_type = ANY ('{Created,Departed,Delivered}' AND
event_payload ->> 'terminal' = ANY ('{Berlin,Hamburg,Munich}')
Index Searches: 1
Buffers: shared hit=279131
-- (3)
-> Index Scan using idx_3
(cost=0.57..6979008.62 rows=64540 width=72)
(actual time=0.238..8777.846 rows=204053.00 loops=1)
Index Cond:
event_created >= '2024-01-01 00:00:00+00' AND
event_created < '2024-02-01 00:00:00+00' AND
event_type = ANY ('{Created,Departed,Delivered}')
Filter: event_payload ->> 'terminal' = ANY ('{Berlin,Hamburg,Munich}')
Rows Removed by Filter: 4292642
Index Searches: 1
Buffers: shared hit=4509185
Let's say IndexScan on (3) filters a lot of tuples and is therefore slow. However, even after eliminating 4 million rows, IndexScan on (3) is still twice as fast as IndexScan on (2). At the same time, the only difference between indexes (1) and (2) is the order of the columns.
If we compare scans (1) and (2), the only noticeable difference is a 30% difference in the number of buffer pages hit. But not 50 times! That means the EXPLAIN does not show us where the main work was done; only the cost value signals the superiority of index (1).
However, we live in the world of ORM and ad-hoc queries, where it is difficult to choose the order of columns in the index, analysing the meaning of the stored data, which means we need to find out precisely what is happening there and what data is missing for the automated detection of an [un]successful index.
If you look at the optimiser code, it becomes clear in numbers why index (1) is so pleasing: all other things being equal, it is going to go through only 39 out of 1 million index pages. Compare this with index (2), which also contains 1 million pages, and we pass through 73 thousand of them. In terms of index tuples, this is 64.5 thousand versus 14 million. It turns out that the main work is to select a row, extract the appropriate attribute and perform the comparison.
The work performed is not represented in the EXPLAIN output. Additionally, the IndexScan structure of the query plan, which is accessible to the Postgres core and its extensions after the plan has been executed, lacks valuable information necessary for assessing the quality of planning and sources of execution time grow. Consequently, developing a method for automatically identifying ineffective indexes and selecting more optimal alternatives appears to be challenging, if not impossible.
There are numerous parameters that the optimiser calculates during the index scan planning. Take a look at the GenericCosts
structure, including numIndexTuples
, numIndexPages
, indexCorrelation
, and indexSelectivity
. Having all this information available at the end of execution could help detect scanning anomalies and draw the DBA's attention.
Of course, the number of installations, types of load and cases is close to infinity. Hence, extending the core code by continually adding more data from the optimisation stage to the plan seems not flexible. Moreover, sometimes we would like to have alternative paths that lost the battle but may be beneficial for analysis.
Moreover, since Postgres 18, the core already has a nicely extensible explain, where we may add options, node information, and overall plan information. So, the only step needed is a bridge between the cloud of possible paths and the final plan.
Having this capability would allow extensions to analyse the predictions against the actual outcomes of query execution. Additionally, it would help in making informed decisions for fine-tuning the query planner and developing effective indexing strategies for a table.
Please feel free to share your feedback, whether you agree or disagree with my viewpoint.
THE END.
7 De Agosto De 2025, Torrevieja, EspaƱa.