Vector Databases: A Practical Overview

Intermediate 11 min read

What you'll learn

✓Why a normal SQL database struggles with vector search
✓How approximate nearest neighbour (ANN) indexes work
✓HNSW vs IVF at a high level
✓How pgvector, Qdrant, Pinecone, Chroma, and Weaviate compare
✓A practical decision framework for picking one

Prerequisites

•You know what an embedding is — see Text Embeddings
•Comfort with the idea of a database index

Once you have embeddings, you need somewhere to put them. A vector database is the storage and indexing layer for high-dimensional vectors and the queries that hunt for nearest neighbours among them.

A normal database can technically do this. It just gets slow fast. This post explains why, what specialised systems do differently, and how to choose between the popular options without getting lost in marketing.

Why a normal database struggles

A typical RAG corpus is one to one hundred million vectors, each of length 768 to 1536. Search means: given a query vector, find the top 10 documents with highest cosine similarity.

The brute-force approach is a full scan — compute similarity against every vector, return the top 10. This works at 10,000 documents. At 10 million it is too slow for interactive queries.

Postgres can do this directly with the <-> operator and pgvector, but without a vector index it still scans. SQL B-tree indexes do not help — they organise scalars along a line, and “nearest in 1536 dimensions” is not a range query.

What you need is an index designed for high-dimensional space. That is the core thing a vector database provides.

Approximate nearest neighbours

Exact nearest-neighbour search in high dimensions is fundamentally expensive. The industry compromise is approximate nearest neighbours (ANN) — accept a tiny accuracy loss in exchange for orders-of-magnitude speedup.

ANN indexes typically return 95–99% of the true top-K. For RAG, that is almost always fine. The user does not notice if document 9 of 10 was a slightly weaker match.

Two ANN index families dominate.

HNSW

Hierarchical Navigable Small World — a layered graph where each node connects to a handful of neighbours. Search descends from a sparse top layer to a dense bottom layer, following the closest neighbour at each step.

Properties:

Very fast queries, even at hundreds of millions of vectors
Memory-hungry — the whole graph lives in RAM
Slow inserts — building edges is non-trivial
Tunable: M (connections per node), ef (search breadth) trade speed for recall

The default modern choice. Used by Qdrant, Weaviate, pgvector (recent versions), Pinecone (under the hood), and most others.

IVF

Inverted File Index — cluster vectors into k centroids. At query time, find the nearest centroid(s), search only within those clusters.

Properties:

Smaller memory footprint than HNSW
Faster builds
Lower recall for the same speed, especially at cluster boundaries
Often paired with PQ (product quantization) to compress vectors further

Good when memory is the bottleneck. Used by FAISS-based systems and some pgvector setups.

You will also see DiskANN (HNSW-like, optimized for SSD) and ScaNN (Google) in some products.

What a vector database does on top

Indexing is the core. Production systems add:

Metadata filtering — “find similar docs where tenant_id = 42 and date > 2025-01-01”. Trickier than it sounds; naive filtering breaks ANN guarantees.
Hybrid search — combine vector similarity with keyword (BM25) scoring. Improves recall on rare terms like product codes.
Multi-tenancy — namespaces or collections so one cluster serves many customers.
Persistence and replication — vectors are valuable; you do not want to recompute them.
Re-ranking — a second pass with a more expensive model on the top 50 candidates.

These features, more than raw search speed, are what separate “I built it on FAISS” from “I run it in production.”

The major players

A high-level take. All of these are credible choices; the question is fit.

pgvector

A Postgres extension. Vectors are a column type; you query with SQL.

-- Find 5 most similar documents to a query vector
SELECT id, content, embedding <=> $1 AS distance
FROM documents
WHERE tenant_id = 42
ORDER BY embedding <=> $1
LIMIT 5;

Best when you already run Postgres and want one database, not two
Strength — transactional consistency, joins, mature ops story
Weakness — performance ceiling lower than specialised stores at huge scale
Free, open source

For most small-to-mid RAG projects, pgvector is the boring, correct answer.

Qdrant

Open-source vector DB written in Rust. HNSW under the hood, strong filtering, gRPC and REST APIs.

Best when you want a dedicated vector store you can self-host
Strength — fast, good filtering, generous free tier on Qdrant Cloud
Weakness — another piece of infrastructure to operate
Open source plus managed cloud

Pinecone

The original managed vector database. Closed source, hosted only.

Best when you want zero ops and have budget
Strength — fully managed, scales well, mature SDK
Weakness — vendor lock-in, cost at scale, opinionated data model
Paid (with a small free tier)

Chroma

Lightweight, developer-friendly, easy to embed in a Python app.

Best when prototyping or running small workloads
Strength — minimal setup, runs in-process or as a server
Weakness — newer; production stories at large scale are fewer
Open source plus optional cloud

Weaviate

Open-source with a richer feature set — built-in vectorization modules, GraphQL queries, hybrid search.

Best when you want an opinionated stack that ingests raw text and produces searchable vectors
Strength — feature-rich, good hybrid search
Weakness — more concepts to learn, heavier deployment
Open source plus managed cloud

Others worth knowing: Milvus (high scale, more complex), LanceDB (file-based, embedded), Vespa (heavyweight search platform), Elasticsearch / OpenSearch (mature search with vector support bolted on).

A simple decision framework

Honest questions, in order:

Do you already run Postgres? If yes, start with pgvector. Migrate only when you hit a wall.
Do you want to operate it yourself? No → Pinecone or a managed Qdrant/Weaviate. Yes → Qdrant or Weaviate.
Are you prototyping or in production? Prototyping → Chroma or pgvector. Production at scale → Qdrant, Weaviate, Pinecone, or Milvus.
Do you need heavy metadata filtering? Some stores handle this better than others. Test with realistic data.
Hybrid search important? Weaviate and Qdrant have first-class support; pgvector requires a bit more glue.

A simple rule that has aged well: choose the simplest option that clears your current bar, and budget engineering time for migration only if you actually outgrow it.

Try it. Spin up pgvector locally with Docker and load 10,000 chunks from a dataset you care about. Measure query latency. Now add an HNSW index and measure again. The numbers will give you a much better intuition for what indexes buy you than any benchmark page.

A small operational checklist

Whichever you pick, the same operational questions matter.

Backups. How do you snapshot the index, not just the source documents?
Re-embedding. How do you swap embedding models without downtime?
Index parameters. HNSW M and ef_construction affect quality and build time. Set them intentionally.
Filtering performance. Does adding a WHERE tenant_id = X slow searches by 10x? Test before launch.
Monitoring. Track recall on a fixed eval set — see LLM Evaluation Basics. Index drift is sneaky.

Sanity check. For your current RAG project (or one you are planning), write down the corpus size, expected QPS, filter dimensions you need, and your ops appetite. Map those onto the framework above. Often the answer is obvious as soon as the numbers are on paper.

Recap

A vector database stores high-dimensional vectors and answers nearest-neighbour queries fast
ANN indexes trade tiny accuracy for huge speedups; HNSW is the modern default
pgvector for “already use Postgres”, Qdrant/Weaviate for self-hosted specialised stores, Pinecone for fully managed, Chroma for prototyping
Metadata filtering, hybrid search, and ops maturity matter as much as raw speed
Start simple, migrate only when measurements demand it

Next steps

Now that you can store and retrieve, the last lever is shaping how the model uses what it gets back — reusable prompt patterns.

→ Next: Prompt Templates and Reusable Patterns

Questions or feedback? Email codeloomdevv@gmail.com.