PostgreSQL Index Usage and Optimization
PostgreSQL Index Usage and Optimization Indexing is the single biggest lever in SQL performance, and it is also the category where most of the bad advice lives. "Add an index" solves a narrow class of problems. "Add the right index, in the right shape, for the right query, and drop the ones you don't need" is the actual job — and it's more design work than most teams expect. This is article 2 in a series on PostgreSQL query analysis. The pillar is The Complete Guide to PostgreSQL SQL Query Analysis & Optimization; article 1 covers reading EXPLAIN output. The running dataset is 500k-row sim_bp_orders / 200k-row sim_bp_users / 50k-row sim_bp_products on Neon Postgres 17.8; every EXPLAIN block is from a real run. We'll cover: when the planner actually uses an index, the four design choices that matter most (column selection, partial, covering, expression), the less-common index types and when they beat btrees, how to find unused indexes, and four cases where not adding an index is the correct call. An index is a data structure; "using an index" is a planner decision. PostgreSQL estimates the cost of each candidate plan — sequential scan, index scan, index-only scan, bitmap scan — and picks the cheapest. Three things drive that choice: Selectivity. The estimated fraction of rows the query will return. If the filter returns 0.1% of rows, an index scan is almost always cheaper. If the filter returns 30%, it depends on the rest of the query shape. If the filter returns 70%, the planner will almost always choose a sequential scan because visiting most of the heap sequentially costs less than reading index pages plus random heap I/O. Correlation. If the rows matching the filter are physically clustered on disk, the planner's random-access penalty shrinks and an index scan becomes more attractive. If they're scattered, random I/O dominates and seq scan wins. The pg_stats.correlation column (range -1 to 1) tells you how clustered each column's values are. Time-series tables (created_at) often have near-1 correlation because they're append-mostly; status columns usually hover near 0. Cost parameters. random_page_cost (default 4.0) vs seq_page_cost (default 1.0). On SSD-backed storage those defaults are too conservative; lowering random_page_cost to 1.5 or 2.0 makes the planner reach for indexes more readily. Setting it below seq_page_cost is almost always wrong — it implies random I/O is faster than sequential, which isn't true on any real storage. If you're tempted to go there, you probably want to raise effective_cache_size instead. If a plan has a Seq Scan, no index-type nodes, and more than two nodes total, you probably have a missing or ignored index. It's a signal, not a verdict — some queries genuinely don't want an index — but it's worth checking. The cheapest index in any database is the primary-key btree: SELECT * FROM sim_bp_users WHERE user_id = 12345; Index Scan using sim_bp_users_pkey on sim_bp_users (cost=0.42..8.44 rows=1 width=51) (actual time=8.683..8.686 rows=1 loops=1) Index Cond: (sim_bp_users.user_id = 12345) Buffers: shared read=4 Execution Time: 9.700 ms Four shared-buffer reads for a 200,000-row table. The 9.7 ms execution time is dominated by cold-cache reads against Neon's networked storage; on a warm-cache benchmark this drops to sub-millisecond. This is the shape every OLTP single-row lookup should have. A composite index on (user_id, created_at) helps: WHERE user_id = ? (uses the leading column alone). WHERE user_id = ? AND created_at > ? (uses both). WHERE user_id = ? ORDER BY created_at DESC LIMIT n (uses leading equality + sorted trailing column). It does not help WHERE created_at > ? in isolation. This is the leftmost-prefix rule: a btree composite index can answer queries that use a contiguous prefix of its columns, starting with the leading one. Skip-scan isn't efficient on PostgreSQL btrees for reasonable-cardinality leading columns. Rule of thumb: leading columns should be equality predicates, trailing columns range predicates or sort keys. (tenant_id, created_at), not (created_at, tenant_id). CREATE INDEX idx_bp_orders_pending_recent ON sim_bp_orders (created_at) WHERE status = 'pending'; The index only contains rows where status = 'pending', so it's roughly one-fifth the size of a full index on created_at. The planner will use it for any query whose WHERE clause implies status = 'pending' — it proves this by theorem-proving over the predicates. So WHERE status = 'pending' AND created_at > now() - interval '1 day' works, but WHERE status IN ('pending', 'shipped') AND ... doesn't (the IN predicate doesn't imply the partial predicate). Two gotchas: they're fragile to query rewording (a function, a cast, a reworded predicate can break the implication proof), and they pay write cost whenever a row moves into or out of the partial predicate. INCLUDE tucks non-key columns into the leaf pages: CREATE INDEX idx_bp_orders_pending_by_amount ON sim_bp_orders (total_amount_cents DESC) INCLUDE (order_id, user_id, created_at) WHERE status = 'pending'; A query that SELECTs any combination of order_id, user_id, total_amount_cents, created_at from this index can be served entirely from index pages — provided the visibility map marks the relevant heap pages as all-visible. On a write-heavy table where autovacuum can't keep up, you'll see non-zero Heap Fetches: in EXPLAIN, which defeats most of the benefit. INCLUDE columns cannot be used for index conditions. Rule: put columns used for filtering/joining/ordering in the key; put columns you're only retrieving in INCLUDE. This is where most "why isn't my index being used?" problems live. A btree on email can't serve WHERE lower(email) = ? or WHERE lower(email) LIKE 'prefix%'. Case-insensitive prefix search on a 200k-row table without an expression index: Gather (cost=1000.00..5841.09 rows=1000 width=25) (actual time=0.553..122.758 rows=1 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on sim_bp_users Filter: (lower((email)::text) ~~ 'user12%'::text) Rows Removed by Filter: 94444 Execution Time: 122.833 ms Parallel seq scan, 94k rows filtered per worker, 122 ms. The fix: CREATE INDEX idx_bp_users_email_lower ON sim_bp_users (lower(email) text_pattern_ops); For equality on lowercased email, a plain CREATE INDEX ... (lower(email)) is enough. For prefix LIKE, text_pattern_ops is needed because PostgreSQL can only rewrite LIKE 'prefix%' into an index range scan when the index orders text by byte value rather than by locale collation. With the existing idx_sim_bp_users_email_pattern index on email text_pattern_ops: Index Only Scan using idx_sim_bp_users_email_pattern on sim_bp_users (cost=0.42..29.87 rows=20 width=8) (actual time=0.057..24.729 rows=20 loops=1) Index Cond: ((email ~>=~ 'user12'::text) AND (email ~=~ and ~ '{"type": "purchase"}'; jsonb_path_ops indexes only the @> operator but produces a significantly smaller and faster index than the default jsonb_ops. Use it unless you need the other JSONB operators. GIN with pg_trgm turns substring LIKE queries (LIKE '%needle%') into index-backed scans. CREATE INDEX idx_bp_orders_created_at_brin ON sim_bp_orders USING brin (created_at); For our 500,000-row orders table, a BRIN index is ~24 kB; a btree on the same column is ~5 MB. BRIN loses effectiveness immediately if the data isn't correlated — on a shuffled table, the min/max of every page range overlaps the whole value domain and the planner can't skip anything. BRIN is effectively useless on uncorrelated columns and brilliant on time-series data. Geometric types, ranges, and fuzzy matching use GiST or SP-GiST. Hash indexes only support equality and are usually beaten by btrees even for point lookups — use them only when you've measured a specific case where they win. Write-heavy, read-light tables. Every index is write cost. Low selectivity. A btree on a boolean is_active where 90% of rows are active will never be used. A partial index is better. Queries that need most of the table. Reports over large windows are best served by parallel seq scan. Redundant indexes. (a, b, c) subsumes (a, b) and (a). Drop the prefixes, keep the longest. SELECT s.indexrelname AS index_name, s.relname AS table_name, pg_size_pretty(pg_relation_size(s.indexrelid)) AS size, s.idx_scan FROM pg_stat_user_indexes s WHERE s.schemaname = 'public' AND s.idx_scan = 0 AND NOT EXISTS ( SELECT 1 FROM pg_constraint c WHERE c.conindid = s.indexrelid AND c.contype IN ('p', 'u', 'x') ) ORDER BY pg_relation_size(s.indexrelid) DESC; Real result from the running database: index_name size idx_scan idx_sim_bp_users_username_pattern 6184 kB 0 idx_sim_bp_users_email_pattern 7960 kB 1 One 6 MB index with zero scans is a straightforward drop. The NOT EXISTS clause skips PK/unique/exclusion constraint indexes — those enforce integrity and are used internally even if no user query hits them. Two caveats: pg_stat_reset() zeros the counter (check the stats timestamp before acting), and a replica's stats only count scans on that replica (don't drop an index from the primary based on replica stats alone). SELECT order_id, user_id, total_amount_cents, created_at FROM sim_bp_orders WHERE status = 'pending' ORDER BY total_amount_cents DESC LIMIT 50; 51 ms sequential scan over 500k rows with a top-n heapsort. Three plausible candidates: (status) — cheapest, most general, but the planner still needs a sort step. (status, total_amount_cents DESC) — solves filter and sort. The sort is free because the index is already ordered on the trailing column within each status group. (total_amount_cents DESC) WHERE status = 'pending' — only pending rows indexed. Smaller, faster to maintain, but only helps pending queries. Option 3 plus INCLUDE (order_id, user_id, created_at) gives Index Only Scan and is the right call for this specific query. If the dashboard later adds status IN ('pending', 'processing'), you'd want option 2 instead. Design indexes for the query you have, and re-read the plans every six months. Originally published at mydba.dev/blog/postgres-index-usage-optimization.
