Postgres Indexing Strategies

Intermediate 11 min read

What you'll learn

✓How Postgres chooses (or ignores) an index
✓btree, hash, GIN, GiST, BRIN: when each wins
✓Partial, expression, and covering indexes
✓Reading EXPLAIN ANALYZE without guessing
✓Maintenance: bloat, REINDEX, and pg_stat_user_indexes

Prerequisites

•Basic SQL

What and why

An index is a separate on-disk structure that lets Postgres find rows without scanning the whole table. The cost is space and write amplification: every insert or update has to maintain every applicable index. The goal of indexing is to add just enough structures to make hot reads fast without crippling writes.

Most query performance work in Postgres lives in the gap between “no index at all” and “an index on every column the team could think of.” Both are wrong.

Mental model

Postgres has a query planner that estimates the cost of each possible plan and picks the cheapest. An index helps only if the planner believes using it costs less than a sequential scan, given the table’s statistics and the query’s selectivity.

SELECT * FROM orders WHERE customer_id = 42;

     Planner estimates rows that match:
          high selectivity (few rows)
                  |
                  v
          Index Scan / Index Only Scan
                |
                v
          traverse btree, fetch rows from heap

          low selectivity (many rows)
                  |
                  v
          Sequential Scan
                |
                v
          read all blocks, filter in memory

Stats live in pg_statistic, refreshed by ANALYZE.
Stale stats -> wrong plan -> slow query.

Planner choosing between seq scan and index

Hands-on example

A btree is the default and covers equality, range, ordering, and prefix LIKE.

CREATE INDEX orders_customer_idx ON orders (customer_id);

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;

Look for Index Scan using orders_customer_idx. If you see Seq Scan despite the index, check selectivity (SELECT COUNT(*) FROM orders WHERE customer_id = 42;) and run ANALYZE orders.

For multi-column lookups, a composite btree wins when the leading column matches. Order matters.

CREATE INDEX orders_customer_created_idx
ON orders (customer_id, created_at DESC);

This index serves WHERE customer_id = 42 ORDER BY created_at DESC LIMIT 10 perfectly: no sort step, no extra heap fetches beyond the limit.

Partial indexes shrink the index to a relevant subset.

CREATE INDEX orders_open_idx
ON orders (created_at)
WHERE status = 'open';

Status-based dashboards hit WHERE status = 'open' AND created_at > now() - interval '1 day' and use a tiny index even if most rows are closed.

Expression indexes match function calls.

CREATE INDEX users_lower_email_idx ON users (lower(email));

SELECT * FROM users WHERE lower(email) = 'ada@example.com';

Without the expression index, lower(email) = ... defeats a plain email index.

GIN indexes are for many-values-per-row data: arrays, jsonb, full-text. A GIN on a tags text[] column makes WHERE tags @> ARRAY['urgent'] fast.

CREATE INDEX docs_tags_gin ON docs USING gin (tags);
CREATE INDEX docs_search_gin ON docs USING gin (to_tsvector('english', body));

BRIN is for very large append-only tables where rows correlate with physical order, like time-series logs.

CREATE INDEX events_time_brin ON events USING brin (occurred_at);

It is tiny (kilobytes for billions of rows) and great for range scans, useless for point lookups.

Covering indexes let an Index Only Scan answer a query without touching the heap.

CREATE INDEX orders_customer_inc_idx
ON orders (customer_id) INCLUDE (total, status);

Now SELECT total, status FROM orders WHERE customer_id = 42 skips the heap entirely if the visibility map is current.

Common pitfalls

Indexing every column is a tax on writes and a memory hit. Each index needs to fit in cache for it to help; too many and none of them do.

A leading wildcard (LIKE '%foo') defeats a btree. Use a trigram index (pg_trgm extension) plus a GIN for substring searches.

CREATE EXTENSION pg_trgm;
CREATE INDEX users_name_trgm ON users USING gin (name gin_trgm_ops);

Type mismatches stop index use. Comparing a text column to an integer parameter casts both sides and skips the index. Match types in the query.

Functions in WHERE clauses bypass indexes unless you have an expression index. WHERE date_trunc('day', created_at) = '2026-06-01' will sequential scan; rewrite as WHERE created_at >= '2026-06-01' AND created_at < '2026-06-02'.

IS NULL filters need a partial index or a special btree to be served quickly. Plan around the question you actually ask.

Production tips

Use EXPLAIN (ANALYZE, BUFFERS) instead of EXPLAIN. The buffer hits tell you whether the data is cached, and the actual row counts tell you whether the planner’s estimates are off.

Watch pg_stat_user_indexes. An index with idx_scan = 0 after weeks of traffic is dead weight. Drop it.

SELECT relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Build new indexes CONCURRENTLY to avoid blocking writes. It is slower but does not lock the table.

CREATE INDEX CONCURRENTLY idx_name ON table (col);

Reindex when bloat accumulates. pgstattuple measures bloat; REINDEX CONCURRENTLY rebuilds without long locks.

Tune autovacuum. Stale stats are responsible for more bad plans than missing indexes. Lower autovacuum_analyze_scale_factor on hot tables.

Wrap-up

Default to btree on the columns your WHERE and ORDER BY clauses touch. Reach for partial and expression indexes when selectivity demands it. Use GIN for jsonb, arrays, and full-text; BRIN for huge time-ordered tables. Always check EXPLAIN ANALYZE, drop unused indexes, build concurrently, and keep statistics fresh. With those habits, the planner becomes a collaborator instead of an adversary.