SQL Indexes: B-Tree vs Hash vs GIN

Intermediate 8 min read

What you'll learn

✓How B-Tree, Hash, and GIN indexes work internally
✓Which queries each index type accelerates
✓When equality vs range vs containment matter
✓How to choose an index for JSONB, arrays, and full-text
✓Tradeoffs around write cost and storage

Prerequisites

•Basic SQL familiarity

What and Why

An index is a secondary data structure that helps the database find rows quickly without scanning the entire table. PostgreSQL ships with several index types, and choosing the wrong one is a common reason queries stay slow even after “adding an index.” The three most important types for application developers are B-Tree, Hash, and GIN. Each is optimized for a different access pattern, and each has different write and storage costs.

Picking correctly can drop query latency from seconds to milliseconds. Picking incorrectly can produce indexes the planner never even uses.

Mental Model

A B-Tree is a sorted, balanced tree. Because the keys are ordered, it can answer equality lookups, range scans, prefix matches, and ORDER BY queries efficiently. It is the default for good reason.

A Hash index stores a hash of the key in buckets. It only answers = queries, but for very large unique keys it can be smaller and faster than B-Tree on equality alone. It cannot help with ranges, sorts, or partial matches.

A GIN (Generalized Inverted Index) flips the layout around. Instead of one entry per row, it stores one entry per token, each pointing to the rows that contain that token. This is exactly what you want for arrays, JSONB containment, and full-text search, where a single row contributes many searchable elements.

Hands-on Example

-- B-Tree: range and ordering
CREATE INDEX idx_orders_created ON orders (created_at);
SELECT * FROM orders
 WHERE created_at >= now() - interval '7 days'
 ORDER BY created_at DESC;

-- Hash: equality on a high-cardinality key
CREATE INDEX idx_sessions_token ON sessions USING HASH (token);
SELECT * FROM sessions WHERE token = 'abc123';

-- GIN: containment in JSONB
CREATE INDEX idx_events_payload ON events USING GIN (payload jsonb_path_ops);
SELECT * FROM events WHERE payload @> '{"type":"signup"}';


B-Tree (sorted)
     [ 50 ]
    /      \
 [20,30] [70,90]   -> good for =, <, >, BETWEEN, ORDER BY

Hash (bucketed by h(key))
 bucket 0: token_x -> row 7
 bucket 1: token_y -> row 3        -> good ONLY for =
 bucket 2: token_z -> row 9

GIN (inverted)
 "signup" -> [r1, r4, r9]
 "login"  -> [r2, r4]              -> good for @>, arrays, FTS
 "admin"  -> [r4]

Index structures: B-Tree sorts keys, Hash buckets them, GIN inverts terms to rows.

Notice how GIN naturally answers “which rows contain this token” because that mapping is exactly what it stores.

Common Pitfalls

A frequent mistake is creating a Hash index on a column that is also used in range queries or sorting. The planner cannot use it, and the index just adds write overhead.

Another is reaching for GIN on a plain text column when you actually want B-Tree. GIN shines when each row has multiple tokens (arrays, JSONB, tsvector). For a simple email column, B-Tree is faster and smaller.

Multicolumn B-Tree indexes care about column order. An index on (a, b) accelerates filters on a alone and on a plus b, but typically not on b alone.

Finally, indexes are not free. Each one slows down INSERT, UPDATE, and DELETE, and consumes storage. Indexes that are never used should be dropped.

Practical Tips

Start by reading the query plan with EXPLAIN (ANALYZE, BUFFERS). If you see a Seq Scan on a large table with a selective filter, you probably need an index. If you see an Index Scan but the plan is still slow, the index might be wrong for the access pattern.

For JSONB, prefer jsonb_path_ops when you only need @> containment; it produces a smaller, faster GIN index than the default operator class.

For full-text search, store a tsvector column (generated or maintained by trigger) and index it with GIN. Avoid building the vector inside the WHERE clause.

Use partial indexes when only a slice of the table is queried, for example WHERE deleted_at IS NULL. They are smaller and faster.

Wrap-up

B-Tree handles the broadest range of queries and should be your default. Hash is narrow but useful for equality-only lookups on large keys. GIN is the right tool when a single row carries many searchable tokens such as arrays, JSONB, or full-text. Match the index to the query shape, measure with EXPLAIN, and drop indexes that are not earning their keep.