Vector Databases Compared
A grounded comparison of vector databases for RAG and semantic search: pgvector, Pinecone, Weaviate, Qdrant, Milvus, and Chroma, with guidance on when each shines.
What you'll learn
- ✓What a vector database actually does
- ✓Tradeoffs between managed and self-hosted options
- ✓When pgvector is enough
- ✓How to choose by scale and team profile
- ✓A minimal pgvector example in Python
Prerequisites
- •Basic Python familiarity
A vector database stores embeddings and finds nearest neighbors quickly. That sounds simple, and at small scale it is, but the choices you make early shape your latency, cost, and operational pain for a long time. This post compares the main contenders and gives you a framework for picking one.
What you are actually buying
Every vector database does three things: ingest embeddings with metadata, build an approximate nearest neighbor (ANN) index, and serve top-k queries with optional filters. The differentiators are how well they scale, how rich the filtering is, how the index is tuned, and how much operational work they demand.
ANN algorithms come up in every product. HNSW (hierarchical navigable small world) is the default for most modern systems because it balances recall and latency well. IVF and product quantization variants matter at very large scale where memory becomes the bottleneck. You usually do not need to pick the algorithm yourself; the database picks defaults that work.
pgvector: start here if you already use Postgres
The pgvector extension turns Postgres into a competent vector store. You get ACID transactions, joins, real SQL, and a single database for both your relational data and your embeddings. With HNSW support added in recent versions, performance is good for millions of vectors per table.
The catch is that very large indexes can be slow to build and memory-hungry, and Postgres scaling tricks like read replicas apply but require care. If you are under 10 million vectors and your team already runs Postgres, this is almost always the right starting point.
import psycopg
from pgvector.psycopg import register_vector
conn = psycopg.connect("dbname=app")
register_vector(conn)
with conn.cursor() as cur:
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute("""
CREATE TABLE IF NOT EXISTS docs (
id bigserial PRIMARY KEY,
content text,
embedding vector(1536)
)
""")
cur.execute("CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops)")
cur.execute("INSERT INTO docs (content, embedding) VALUES (%s, %s)",
("hello", [0.1] * 1536))
cur.execute("SELECT content FROM docs ORDER BY embedding <=> %s LIMIT 5",
([0.1] * 1536,))
print(cur.fetchall())
Pinecone: managed simplicity
Pinecone is a fully managed vector database. You create an index, push vectors, and query. There are no servers to maintain, scaling is mostly automatic, and the SDKs are clean. It is a strong choice if you want to focus on the application and pay someone else to handle the infrastructure.
The downside is cost at scale and the usual managed-service tradeoffs: you give up control over the index internals, and your data lives with the vendor. If you are okay with that, Pinecone is the path of least resistance.
Weaviate: schema-aware and feature-rich
Weaviate emphasizes a schema-first model with classes, properties, and built-in modules for embedding generation, hybrid search, and rerankers. It can run self-hosted or in their cloud. The GraphQL API is divisive; some teams love it, some find it more friction than help.
Weaviate shines when you want hybrid search (vector plus keyword) and rich filters as first-class features, not bolted on. If your search needs go beyond pure semantic similarity, it is a strong candidate.
Qdrant: fast, focused, and Rust-powered
Qdrant is a self-hosted (or managed) vector database written in Rust. It has a simple HTTP and gRPC API, strong filtering, and very good performance per dollar. The defaults are sensible, the docs are clean, and the operational footprint is small.
Teams that want a dedicated vector database without the bigger surface area of Milvus often land on Qdrant. It feels like Redis for vectors: small, fast, easy to reason about.
Milvus: built for huge scale
Milvus is designed for billions of vectors. It separates storage and compute, supports multiple ANN algorithms, and integrates with cloud-native infrastructure. The tradeoff is operational complexity: it is a distributed system with several components, and running it well takes real effort.
Pick Milvus when you genuinely have hundreds of millions of vectors or strict latency targets that smaller systems cannot hit. Below that scale you are paying complexity for capacity you do not need.
Chroma: for prototypes and notebooks
Chroma started as the easiest possible local vector store: install, embed, query, done. It is excellent for prototypes, demos, and tutorials. Production deployments exist but are not the strength; treat it as a development tool unless you have a specific reason to do otherwise.
A decision framework
Start by counting vectors. Under 10 million and on Postgres already? Use pgvector. Want a managed service and willing to pay for it? Pinecone. Need hybrid search and a schema? Weaviate. Want a fast self-hosted dedicated store with low operational burden? Qdrant. Truly massive scale? Milvus. Prototyping in a notebook? Chroma.
Then ask three questions. How important is metadata filtering? How much hybrid search do I need? How comfortable is my team running stateful services? The answers narrow the list quickly.
The boring advice
Whatever you pick, keep your embedding generation, indexing, and querying behind a small interface in your code. Vector databases are improving fast, and you will probably switch at least once. If your application code talks to a thin VectorStore class with upsert and search methods, that switch is a weekend, not a quarter. Lock-in is a choice, and the right abstractions keep you out of it.
Related articles
- AI Vector Databases Explained for Engineers Shipping RAG
What vector databases actually do, how ANN indexes work, and how to choose one without falling for benchmark theater.
- AI LLM Context Windows: Trade-offs Beyond Token Count
Why bigger context windows are not always better: cost, attention degradation, retrieval design, and how to architect for long-context tasks.
- AI RAG Retrieval Strategies
Practical retrieval strategies for RAG: chunking, hybrid search, reranking, query rewriting, metadata filtering, and evaluation patterns that actually move the needle.
- Embeddings & RAG Text Embeddings: The Foundation of Semantic Search
What an embedding is, why cosine similarity works, how dimensionality and chunking choices affect retrieval, and a tiny numpy example you can run in your head.