Skip to content
Go back

Postgres EXPLAIN ANALYZE Without Crying

By SumGuy 12 min read
Postgres EXPLAIN ANALYZE Without Crying

You Ran EXPLAIN. Now What?

You’ve got a slow query. Someone told you to run EXPLAIN ANALYZE on it. You did. Now you’re staring at 40 lines of indented output that looks like a compiler error had a baby with a tax return, and you have no idea what you’re looking at.

Honestly, you’re not alone. Postgres query plans are one of those things that seem designed to make you feel dumb. But once you know the three or four things that actually matter, the wall of text starts making sense fast.

Let’s walk through it. Real output, real numbers, real fixes.


EXPLAIN vs EXPLAIN ANALYZE vs EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)

First, let’s clear up the alphabet soup.

EXPLAIN — Postgres plans the query but does not run it. You get estimated costs and row counts. Fast, safe on production, but the estimates can lie to you.

EXPLAIN SELECT * FROM orders WHERE user_id = 42;

EXPLAIN ANALYZE — Actually runs the query. You get real execution times, real row counts, and real loop counts. The only way to know if the optimizer’s estimate was anywhere close to reality.

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;

Side effect: the query actually runs. If it’s a DELETE or UPDATE, wrap it in a transaction and roll back.

BEGIN;
EXPLAIN ANALYZE DELETE FROM sessions WHERE expires_at < now();
ROLLBACK;

EXPLAIN (ANALYZE, BUFFERS) — The version you should be using most of the time. BUFFERS shows you how many 8KB pages were read from shared memory cache vs. disk. This is where you find out if your slow query is slow because of a bad plan, or just because you cold-cached 10,000 pages.

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE user_id = 42;

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS) — The full monty. VERBOSE adds column-level output, which is mostly noise unless you’re debugging a specific node. SETTINGS shows any non-default planner settings in effect. Useful when you’re tuning work_mem or enable_seqscan and want to confirm the setting actually took effect.


Reading the Four Numbers Per Node

Every plan node has a line that looks like this:

Seq Scan on orders (cost=0.00..18432.00 rows=450000 width=48)
(actual time=0.042..4821.334 rows=448712 loops=1)

Here’s what each number means:

cost=0.00..18432.00 — Two numbers. The first is startup cost (the cost to return the first row). The second is total cost (cost to return all rows). These are unitless — not milliseconds, not bytes, just Postgres’s internal accounting units. They’re relative, not absolute. A cost of 18432 isn’t inherently bad; what matters is how it compares to other nodes.

rows=450000 — The planner’s estimate of how many rows this node will return. This is the optimizer’s best guess, based on table statistics. If this is way off from actual rows, you’ve found a problem.

width=48 — Average row width in bytes. Mostly irrelevant unless you’re debugging memory usage.

actual time=0.042..4821.334 — The real time in milliseconds. First number is time to first row, second is time to last row. This is what you care about.

rows=448712 — Actual rows returned. Compare this to the estimate. If rows=450000 (estimated) vs rows=448712 (actual), the optimizer nailed it. If it’s rows=450000 vs rows=3, something is deeply wrong with your statistics.

loops=1 — How many times this node executed. In nested loops, this goes above 1. The actual time is per loop, not total. More on this in a minute.


Plan Node Types: What They Mean and When They’re Wrong

Seq Scan — Full table scan. Postgres reads every row. This is fine on small tables. It’s a red flag on a 10-million-row table when you’re filtering on a column that should have an index.

Index Scan — Postgres walks the B-tree index to find matching rows, then fetches each row from the heap. Fast for low-selectivity queries (returning a small percentage of rows).

Index Only Scan — All the columns you need are in the index. Postgres never touches the heap. Fastest option if your visibility map is maintained (i.e., you vacuum regularly).

Bitmap Index Scan + Bitmap Heap Scan — For medium-selectivity queries, Postgres first builds a bitmap of heap page locations from the index, then reads those heap pages in order (avoiding random I/O). You’ll see these two nodes paired together. It’s the middle ground between Seq Scan and Index Scan.

Hash Join — Postgres builds a hash table from the smaller relation, then probes it with the larger. Fast for large joins when both sides fit in work_mem. If the hash table spills to disk (Batches > 1 in the output), raise work_mem.

Merge Join — Both inputs must be sorted. Fast when both sides are already sorted (e.g., joining on indexed columns). If you see a Sort node feeding into a Merge Join, that sort is costing you.

Nested Loop — For each row in the outer relation, scan the inner relation. Devastating at scale if the inner scan is a Seq Scan. Perfect when the inner scan is an efficient index lookup and the outer set is small.

Sort — Self-explanatory. When you see Sort Method: external merge Disk: 28672kB, your work_mem is too low and Postgres is spilling to disk.

Hash Aggregate / Group Aggregate — Hash Aggregate builds a hash table of groups in memory. Group Aggregate requires sorted input and streams through it. Hash Aggregate is usually faster, but it needs memory. If you see Group Aggregate where you expected Hash Aggregate, your data might not fit in work_mem.


The Estimate vs. Actual Rows Mismatch Smell

Here’s the thing that trips up most people: Postgres’s planner is only as good as its statistics. If the estimates are wildly wrong, the planner will make bad decisions — wrong join order, wrong join type, choosing a Seq Scan over an Index Scan.

The rule of thumb: if estimated rows are off by 10x or more, you have a statistics problem.

Stale stats — Run ANALYZE orders; or VACUUM ANALYZE orders;. Stats go stale on heavily-written tables faster than the autovacuum keeps up. Check pg_stat_user_tables for last_analyze and n_mod_since_analyze.

SELECT relname, last_analyze, last_autoanalyze, n_mod_since_analyze
FROM pg_stat_user_tables
WHERE relname = 'orders';

Correlated columns — Postgres assumes columns are statistically independent. If you filter on (country = 'US' AND state = 'CA'), Postgres might estimate 5% × 5% = 0.25% selectivity, when actually California-in-US is 12% of your data. The fix:

CREATE STATISTICS orders_country_state (dependencies)
ON country, state FROM orders;
ANALYZE orders;

This teaches the planner about the correlation. It’s underused, and it’s genuinely useful on multi-column filters.


BUFFERS: Where Your Time Actually Went

Add BUFFERS to your explain and look for lines like:

Buffers: shared hit=124 read=10847 written=3

shared read=10847 on what should be a simple lookup is a smoking gun. Either your buffer cache is cold, your query plan is pulling in too much data, or both.

The fix depends on the diagnosis: if it’s a bad plan, fix the plan. If it’s a warm-up issue, that’s expected on the first run. If it’s structural (index missing, table bloat), fix the root cause.


The Loops Trap

This one bites everyone eventually. Nested Loop with a high loop count:

Nested Loop (cost=0.00..243.00 rows=1 width=72)
(actual time=0.021..18431.002 rows=84212 loops=1)
-> Seq Scan on users (cost=0.00..180.00 rows=8400 width=48)
(actual time=0.008..12.143 rows=8400 loops=1)
-> Index Scan on orders (cost=0.00..0.07 rows=1 width=24)
(actual time=0.002..0.002 rows=10 loops=8400)

The Index Scan shows actual time=0.002 — looks trivially fast, right? Wrong. It ran 8400 loops. Total time for that node: 0.002ms × 8400 = 16.8 seconds. The “per loop” time is deceptive until you multiply it out.

Always multiply actual time by loops to get the real contribution of a node to overall query time.


A Real Query: From 8 Seconds to 50ms

Here’s a realistic example. Table: events, 4.5 million rows, tracking user activity. Query: find all events for a user in the last 30 days.

SELECT id, event_type, created_at, payload
FROM events
WHERE user_id = 12345
AND created_at >= now() - interval '30 days'
ORDER BY created_at DESC
LIMIT 100;

Before — no index on (user_id, created_at):

Sort (cost=94821.34..95946.34 rows=450000 width=128)
(actual time=8234.521..8234.788 rows=100 loops=1)
Sort Key: created_at DESC
Sort Method: quicksort Memory: 85kB
Buffers: shared hit=2341 read=14872
-> Seq Scan on events (cost=0.00..62841.00 rows=450000 width=128)
(actual time=0.031..7943.221 rows=87432 loops=1)
Filter: ((user_id = 12345) AND (created_at >= (now() - '30 days'::interval)))
Rows Removed by Filter: 4412568
Buffers: shared hit=2341 read=14872
Planning Time: 0.412 ms
Execution Time: 8234.891 ms

Red flags everywhere: Seq Scan on 4.5M rows, 14,872 pages read from disk (~116MB), rows removed by filter is enormous, Sort is running after the fact.

The fix:

CREATE INDEX CONCURRENTLY idx_events_user_created
ON events (user_id, created_at DESC);

After — with composite index:

Limit (cost=0.56..68.42 rows=100 width=128)
(actual time=0.312..1.843 rows=100 loops=1)
Buffers: shared hit=112
-> Index Scan using idx_events_user_created on events
(cost=0.56..5923.11 rows=8743 width=128)
(actual time=0.308..1.811 rows=100 loops=1)
Index Cond: ((user_id = 12345) AND (created_at >= (now() - '30 days'::interval)))
Buffers: shared hit=112
Planning Time: 0.388 ms
Execution Time: 1.962 ms

8234ms → 1.96ms. shared read=14872 → 0 (all in cache, and only 112 pages total). The index covers both filter columns, the sort is free because the index is already ordered DESC, and the Limit stops execution after 100 rows.


Common Fixes Cheat Sheet

Missing index — The obvious one. If you see a Seq Scan on a large table with a selective filter, check if the column has an index.

Type mismatch killing index use — This one is subtle. If your column is user_id bigint but your query passes user_id = '12345' (a string literal in some ORMs), Postgres might cast the whole column rather than the value, making the index useless. Check column types and make sure they match.

Function-wrapped columns — This query will never use an index on created_at:

WHERE date_trunc('day', created_at) = '2026-07-01'

Rewrite it as a range:

WHERE created_at >= '2026-07-01' AND created_at < '2026-07-02'

Now the index is usable.

Sort spilling to diskSort Method: external merge Disk: 45678kB means work_mem is too low. Bump it for the session:

SET work_mem = '256MB';
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

If that helps, consider bumping it globally in postgresql.conf (carefully — it’s per-sort-operation, not per-connection).

Parallel plans disabled — If you’re not seeing parallel workers on large scans, check:

SHOW max_parallel_workers_per_gather;

If it’s 0, parallel query is off. Set it to 2 or 4 and see if Postgres starts using workers.

SET max_parallel_workers_per_gather = 4;

Hash join spilling to diskBatches: 8 in a Hash Join node means the hash table didn’t fit in work_mem. Same fix: raise work_mem.


Tools That Make This Less Painful

You don’t have to read raw text output forever. These tools take your EXPLAIN output and make it visual:

explain.depesz.com — Paste your plan, get a color-coded table. The oldest and most reliable. Highlights the slowest nodes. Free, no signup.

explain.dalibo.com — Paste SQL + plan, get a tree diagram. Better for visualizing plan structure and finding which branch of a large plan is the bottleneck.

pev2 — Open source, self-hostable version of the Dalibo visualizer. If you’re paranoid about pasting production query plans to third-party sites (reasonable), run this locally via Docker:

Terminal window
docker run -d -p 8080:80 dalibo/pev2

pgMustard — Paid SaaS with automated recommendations. It’ll read your plan and tell you “this sort is spilling to disk, raise work_mem.” Good for teams that want a low-friction first pass.

For self-hosters, pev2 + explain.depesz.com covers 90% of cases for free.


The Bottom Line

EXPLAIN ANALYZE output looks intimidating until you know the four things to look for every time:

  1. Estimate vs. actual rows — if they’re off by 10x, run ANALYZE or add extended statistics.
  2. shared read in BUFFERS — high numbers mean lots of disk I/O; find out why.
  3. Loop multiplication — always multiply actual time by loops before declaring a node fast.
  4. Sort methodexternal merge Disk: means your work_mem is undersized.

Everything else is context. Once you’ve spotted the smoking node — the one with the high actual time or the massive rows removed by filter — fixing it is usually just an index or a config change.

The hardest part isn’t reading the plan. It’s actually running EXPLAIN (ANALYZE, BUFFERS) instead of just staring at the slow query and hoping it gets faster on its own.

It won’t. But now you know what to do about it.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Next Post
Boundary vs Teleport

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts