Finding invisible use-after-free bugs in the PostgreSQL planner
One more time about dangling pointers
On a PostgreSQL build with assertions enabled, run the standard make check-world suite with a small debugging extension called pg_pathcheck loaded. It will report on pointers to freed memory in the planner's path lists. Such dangling pointers exist even in the core Postgres now. They are harmless today. But the word today is what makes this worth writing about.
A production story
This story started in July 2021. At the time, I was finishing a sharding solution built on top of postgres_fdw. During testing, our engineers sent me an example query that would crash periodically with a SEGFAULT. One look at the plan told me something was very off.
Append
-> Nested Loop
Output: data_1.b
Join Filter: (g1.a = g2.a)
...
-> Materialize
Output: g2.a, data_2.b
-> Hash Join
Output: g2.a, data_2.b
Hash Cond: (data_2.b = g2.a)
...The first obvious question: how did a Materialize node end up as a direct child of an Append? The second, more interesting one: how can one Append combine two sources with different tuple widths? No wonder the query was crashing — and to make it worse, the failure was intermittent; sometimes the very same query produced a perfectly reasonable plan.
On paper, the bug shouldn't have been possible: the optimiser doesn't work that way. A few days of debugging pointed the finger at dangling pointers. While building one of the alternative Append paths, the optimiser adds a cheaper path to a child RelOptInfo’s pathlist and evicts the one that was there before. But the previously constructed Append still holds a pointer to that now-freed slot. A step or two later, the allocator hands the exact same chunk back out for a new Path higher up the tree, for, say, an enclosing JOIN. The result is a plan that makes no semantic sense at all.
Where dangling pointers come from
PostgreSQL builds each relation's pathlist incrementally through add_path(). When a newly arrived path dominates an existing one — cheaper across all relevant dimensions (startup cost, total cost, pathkeys, parallel-safety) — the dominated path can be released immediately.
Because the optimiser builds the plan bottom-up, the pathlists of lower operations (scans, for instance) are completed first, and then the pathlists of upper operations (Append and friends) are assembled with references to specific entries in the lower nodes' pathlists.
This works fine right up until the optimiser, while building a Path for an upper operation, decides that the plan can be improved by adding something to a lower pathlist. At that point, a path may be evicted from the lower pathlist — one that is already referenced from higher up the plan tree. When that happens, we have a dangling pointer.
There is also an example — an upper rel evicting a path that a lower rel still references — in the Postgres core itself. A minimal reproducer looks roughly like this:
DROP TYPE IF EXISTS insenum CASCADE;
CREATE TYPE insenum AS enum ('L1', 'L2');
EXPLAIN (COSTS OFF)
SELECT enumlabel,
CASE WHEN enumsortorder > 20 THEN NULL ELSE enumsortorder END AS so
FROM pg_enum
WHERE enumtypid = 'insenum'::regtype
ORDER BY enumsortorder;With pg_pathcheck loaded, you will see:
WARNING: pg_pathcheck: invalid NodeTag T_SeqScan in pathlist, rel {pg_enum}
DETAIL: pathlist contents: [0] T_ProjectionPath; [1] T_SeqScan INVALID
Sort
Sort Key: enumsortorder
-> Seq Scan on pg_enum
Filter: (enumtypid = '16590'::oid)What's going on: the path representing the scan of pg_enum gets shared under certain conditions with ordered_rel, the rel that represents the sorted query result. Later, when a new path PP3 arrives in ordered_rel, the old shared PP2 is evicted and freed by add_path() — but input_rel->pathlist still holds a pointer to the freed chunk:
input_rel {pg_enum} ordered_rel
┌──────────────────┐ ┌──────────────────┐
│ pathlist: │ │ pathlist: │
│ [0] → PP1 │ │ [0] → PP2 ◄────┐
│ [1] → PP2 ─────────── SHARED ──────────────────────── ┘
└──────────────────┘ └──────────────────┘
input_rel {pg_enum} ordered_rel
┌──────────────────┐ ┌──────────────────┐
│ pathlist: │ │ pathlist: │
│ [0] → PP1 │ │ [0] → PP3 │
│ [1] → ??? ◄─── dangling ───╳╳╳ │ │
└──────────────────┘ pfree'd └──────────────────┘
chunkWhy does it still work in practice
In vanilla Postgres, this example causes no visible problems, thanks to a subtle invariant: Postgres never walks input_rel's pathlist once ordered_rel is under construction — it uses direct references to the cheapest_* paths instead. The dangling pointer is created, but nobody dereferences it before the per-query memory context is reset at the end of the statement.
Extensions, however, often carry their own internal logic and may need to traverse the full pathlist, which can trip the bug. Separately, when building complex plan transformations along the lines of the example above, Postgres forks can produce dangling-pointer situations of their own — and there is no guarantee that all of them will be found and fixed before shipping to production.
Features that manipulate the query plan at runtime — disable_node, pg_plan_advice, pg_hint_plan and the like — can also accidentally trigger latent optimiser issues and crash the system with a SEGFAULT.
A small walker
pg_pathcheck is around 600 lines of C. It registers two planner hooks: create_upper_paths_hook to remember the top PlannerInfo and planner_shutdown_hook to do the work.
The walker visits every Path reachable from the top PlannerInfo. That means the upper_rels[] arrays, every simple_rel_array entry with its optional subquery subroot, every join rel collected during dynamic programming, the various parallel RelOptInfos (unique_rel, grouped_rel, part_rels), and — within compound Path nodes — every embedded sub-path field (outerjoinpath, innerjoinpath, subpath, subpaths, bitmapqual, …). A visited-pointer hash keeps the traversal linear.
At every pointer, two checks run. The first is a NodeTag whitelist: if path->type is not one of the known Path-family tags, the memory has either been filled with 0x7F (freed, not yet reused) or re-allocated as some other kind of node. The second, used for base and join rels, is a parent-match check: path->parent on a path in rel->pathlist must equal rel. A mismatch catches same-size-class aliasing — a freed chunk that has been recycled into another valid Path belonging to a different rel entirely. The tag check passes in that case, but the ownership is wrong.
When a check fires, the extension emits a report at a configurable elevel (WARNING / ERROR / PANIC, controlled by pg_pathcheck.elevel). The report names the rel, the slot where the stray pointer sits, the full contents of the containing list (with each element annotated by its node kind), and — via debug_query_string in the hint — the query that triggered it.
The allocator wrinkle
PostgreSQL's aset.c uses power-of-two size classes. A Path is 80 bytes, which lands in the 128-byte class. So do:
ProjectionPath(96),SortPath(88),MaterialPath(88)NestPath,AppendPath(112).Among Plan nodes,
SeqScan(112),BitmapHeapScan(120),NestLoop(128),Hash(128),Result(128),Gather(128), and roughly a dozen more scans.
When a Path is freed, its slot returns to the 128-byte freelist. The next makeNode(<Something>) inside the planner picks up that exact address. An old dangling pointer that had been invisible during planning now references a perfectly live Path or Plan node — with a valid but wrong NodeTag, and with fields shaped for an entirely different kind of object. This is also why CLOBBER_FREED_MEMORY on its own is not enough to detect the problem: the clobber pattern is overwritten by the re-allocation before any walker gets a chance to see it. Out of about 4,000 findings in my full-suite run, the number of pointers found carrying the raw 0x7F7F7F7F fill was zero. It looks like memory is fully reused.
Valgrind catches a use-after-free at the moment of the dereference — and in this case, nothing ever dereferences the stale pointer during normal execution. So, a structural walker that verifies the pathlist's semantic invariants — ‘this pointer must reference a live Path owned by this rel’ — is the right tool for a use-after-free that the rest of the program is disciplined enough not to trigger — byte-level tools catch only the dereference, not the dangling reference.
Who may need this
The three audiences may find this code useful.
PostgreSQL core developers. The hackers' threads discuss at least three solution shapes — reference-counted paths, a used flag, and local memory contexts — and have been going back and forth on whether the unwritten contract is actually worth tightening. This dataset grounds the discussion.
Extension authors — especially those who write custom-scan providers, FDWs, optimisation features, or plan-inspection tooling — benefit from the tool as a sanity check.
Fork maintainers have the largest blast radius. Forks tend to modify the planner more aggressively than extensions can, and they ship on schedules that are not always in lockstep with the PostgreSQL master. Running pg_pathcheck against a fork’s test suite will tell you whether your modifications preserve the invariants that core happens to rely on.
Continuous coverage
The repository ships a GitHub Actions workflow that runs make —k check-world against a freshly cloned PostgreSQL master with pg_pathcheck enabled. It runs on every push, every pull request, on manual dispatch, and nightly. The artefacts include the full server logs and a deduplicated summary rendered into the step-summary panel. The wiki hosts the raw reports and written analyses of each run.
If upstream master introduces a new source of dangling pointers, the nightly will flag it the morning after the commit lands. If it closes one, the counts will drop. In either direction, the workflow provides a real-time pulse on the contract's state.
Running it yourself
git clone https://github.com/danolivo/pg_pathcheck
cd /path/to/pg_pathcheck
USE_PGXS=1 PG_CONFIG=/path/to/install/bin/pg_config make install
echo "shared_preload_libraries = 'pg_pathcheck'" > /tmp/ppc.conf
TEMP_CONFIG=/tmp/ppc.conf make check-worldAfter the run, look into the warnings from every tmp_check/log/*.log, log/postmaster.log, results/*.out, and regression.diffs. For interactive debugging, use SET g_patcheck.elevel = error or panic which makes it easy to correlate a specific finding with a specific test query.
The extension targets PostgreSQL master specifically — it uses PG_MODULE_MAGIC_EXT and the extension_state slot API, both recent additions. It registers no SQL objects. CREATE EXTENSION pg_pathcheck does nothing useful. All effects are routed through the planner hooks it installs at library load.
If you try it and find something — on an unmodified master, on a fork, in your extension's test suite — I would like to read about it in comments.
THE END.
April 20, 2026, Madrid, Spain.
Disclosure
Most of pg_pathcheck's code and this post were drafted with the help of a large language model (Claude). Every change was reviewed by a human before being committed, but the prose and structure are largely machine-produced.

