pgvector: Add Vector Search to Postgres

Intermediate 9 min read

What you'll learn

✓Install pgvector and create a vector column
✓Insert and query embeddings from Python
✓Choose between IVFFlat and HNSW indexes
✓Combine SQL filters with vector search
✓Run hybrid keyword plus vector ranking

Prerequisites

•Read [RAG Embeddings Explained](/blog/rag-embeddings-explained)
•Skim [RAG Vector Databases Overview](/blog/rag-vector-databases-overview)
•A Postgres 14+ instance you can connect to

pgvector is a Postgres extension that adds a vector type plus distance operators and indexes. The win is keeping your application data and your vectors in the same database. No second system to operate, no eventual consistency between primary store and search store.

Enable the extension

CREATE EXTENSION IF NOT EXISTS vector;

If you are on managed Postgres (RDS, Supabase, Neon), the extension is usually one toggle away.

Schema

The dimension must match your embedding model. text-embedding-3-small is 1536.

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    team TEXT,
    embedding VECTOR(1536)
);

Put your real metadata in real columns. You get type safety and proper SQL filtering for free.

Insert from Python

pip install psycopg pgvector openai

import os, psycopg
from pgvector.psycopg import register_vector
from openai import OpenAI

oai = OpenAI()
conn = psycopg.connect(os.environ["DATABASE_URL"])
register_vector(conn)

def embed(text: str) -> list[float]:
    r = oai.embeddings.create(model="text-embedding-3-small", input=text)
    return r.data[0].embedding

docs = [
    ("pgvector overview", "pgvector adds vector types to Postgres.", "platform"),
    ("Pinecone overview", "Pinecone is a managed vector database.", "platform"),
]

with conn.cursor() as cur:
    for title, body, team in docs:
        cur.execute(
            "INSERT INTO documents (title, body, team, embedding) VALUES (%s, %s, %s, %s)",
            (title, body, team, embed(body)),
        )
conn.commit()

Query by similarity

pgvector exposes three operators: <-> Euclidean, <#> negative inner product, <=> cosine distance. For OpenAI embeddings, use cosine.

q = "Which vector store lives inside Postgres?"
qvec = embed(q)

with conn.cursor() as cur:
    cur.execute(
        "SELECT title, body, 1 - (embedding <=> %s) AS score "
        "FROM documents ORDER BY embedding <=> %s LIMIT 3",
        (qvec, qvec),
    )
    for row in cur.fetchall():
        print(round(row[2], 3), row[0])

<=> returns distance, so 1 - distance gives a similarity score that lines up with intuition.

Filter by metadata

This is where pgvector shines. Filters are first-class SQL.

SELECT title, body
FROM documents
WHERE team = 'platform' AND created_at >= now() - interval '30 days'
ORDER BY embedding <=> $1
LIMIT 5;

You can join other tables, enforce row-level security, and use materialized views. Try that in a black-box vector store.

Indexing for speed

A sequential scan works for thousands of rows. Past that you need an ANN index. Two choices:

IVFFlat: cheap to build, fast queries with lists tuning. Good for batch loading.
HNSW: more memory, much better recall at high QPS. Good for live search.

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
ANALYZE documents;

Match the index opclass to the operator you query with. vector_cosine_ops pairs with <=>, vector_l2_ops with <->.

Probing IVFFlat

For IVFFlat, you set the number of lists at create time and the number probed per query at runtime.

CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
SET ivfflat.probes = 10;

More probes means better recall and slower queries. Tune from 1 upward until recall plateaus.

Hybrid search

Combine full-text search and vector similarity with a weighted score. Postgres ranks both natively.

SELECT title,
       ts_rank(to_tsvector('english', body), plainto_tsquery('english', $1)) AS kw,
       1 - (embedding <=> $2) AS sim
FROM documents
ORDER BY (0.5 * kw + 0.5 * sim) DESC
LIMIT 5;

Use this when queries contain rare tokens that embeddings dilute, like SKUs or function names.

Wire it into a RAG endpoint

def retrieve(q: str, k: int = 4) -> list[str]:
    v = embed(q)
    with conn.cursor() as cur:
        cur.execute(
            "SELECT body FROM documents ORDER BY embedding <=> %s LIMIT %s",
            (v, k),
        )
        return [r[0] for r in cur.fetchall()]

Front the function with a FastAPI route; see What is FastAPI. Or a Node service via What is Node.js.

Operational notes

Re-embedding is a column update. Use a separate embedding_version column so you can run rolling migrations. Vacuum after large deletes; ANN indexes degrade with dead tuples. Monitor index build time, query latency, and recall, not just QPS.

When to use it

Pick pgvector when you already run Postgres and your corpus fits comfortably in RAM with the index. Pick a dedicated vector DB when you need cross-region replication, billions of vectors, or specialized hybrid search out of the box.

Wrap up

pgvector turns Postgres into a competent retrieval engine without leaving SQL. Pick HNSW for live workloads, keep filters in real columns, and your RAG stack collapses into one operable system.