Pinecone Vector Database: Index, Upsert, Query
Build a working vector search pipeline with Pinecone in Python. Indexes, upserts, metadata filters, hybrid search, and patterns for production RAG.
What you'll learn
- ✓Create a serverless Pinecone index from Python
- ✓Upsert vectors with metadata in batches
- ✓Run nearest-neighbor queries with filters
- ✓Use namespaces to isolate tenants
- ✓Tune top_k, dimensions, and metric for quality
Prerequisites
- •Read [RAG Embeddings Explained](/blog/rag-embeddings-explained)
- •Skim [RAG Vector Databases Overview](/blog/rag-vector-databases-overview)
- •Python 3.10 and an OpenAI or Anthropic key
Pinecone is a managed vector database. You hand it embeddings plus metadata, ask for nearest neighbors, and let someone else run the index. This guide walks the full loop end to end.
Install and authenticate
pip install pinecone openai
export PINECONE_API_KEY=pcsk-...
export OPENAI_API_KEY=sk-...
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone()
Create an index
An index is a logical store. Pick the dimension that matches your embedding model. text-embedding-3-small is 1536.
INDEX = "docs"
if not pc.has_index(INDEX):
pc.create_index(
name=INDEX,
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index(INDEX)
Cosine is the default for embeddings. Use dot product only if your vectors are not normalized and you know why.
Upsert vectors
Each record needs an id, a vector, and optional metadata. Batch upserts. One vector at a time is fine for demos and miserable in production.
from openai import OpenAI
oai = OpenAI()
docs = [
{"id": "d1", "text": "Pinecone is a managed vector database."},
{"id": "d2", "text": "FAISS runs vector search in process."},
{"id": "d3", "text": "pgvector adds vector types to Postgres."},
]
emb = oai.embeddings.create(
model="text-embedding-3-small",
input=[d["text"] for d in docs],
)
vectors = [
{"id": d["id"], "values": e.embedding, "metadata": {"text": d["text"]}}
for d, e in zip(docs, emb.data)
]
index.upsert(vectors=vectors, namespace="default")
Keep metadata small. Store full documents elsewhere and reference them by id.
Querying
Embed the query with the same model, then ask the index for top-k.
q = "Which database can I run inside Postgres?"
qvec = oai.embeddings.create(model="text-embedding-3-small", input=q).data[0].embedding
res = index.query(
vector=qvec,
top_k=3,
namespace="default",
include_metadata=True,
)
for match in res.matches:
print(round(match.score, 3), match.metadata["text"])
Higher score means closer. With cosine, scores live in roughly the 0 to 1 band on normalized embeddings.
Metadata filters
You can filter by metadata at query time. This is fast because Pinecone applies the filter inside the search.
index.upsert(vectors=[
{"id": "p1", "values": qvec, "metadata": {"team": "platform", "year": 2025}},
{"id": "p2", "values": qvec, "metadata": {"team": "growth", "year": 2024}},
], namespace="default")
res = index.query(
vector=qvec,
top_k=5,
filter={"team": {"$eq": "platform"}, "year": {"$gte": 2025}},
include_metadata=True,
namespace="default",
)
Supported operators include $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin.
Namespaces for multi-tenant apps
A namespace is a partition inside the index. Queries are scoped to a single namespace, which is the cleanest way to isolate per-tenant data.
index.upsert(vectors=vectors, namespace="tenant_acme")
res = index.query(vector=qvec, top_k=3, namespace="tenant_acme")
One index, many namespaces. Cheaper than spinning up an index per customer.
Hybrid search
For keyword-heavy queries, mix sparse and dense vectors. Sparse vectors are token-frequency style; dense vectors are embeddings. Pinecone supports both, and you fuse the scores.
res = index.query(
vector=qvec,
sparse_vector={"indices": [10, 45, 102], "values": [0.5, 0.3, 0.2]},
top_k=5,
)
Hybrid wins on queries with rare proper nouns, product SKUs, or code identifiers that embeddings smear together.
Updates and deletes
index.update(id="d1", set_metadata={"reviewed": True}, namespace="default")
index.delete(ids=["d2"], namespace="default")
index.delete(filter={"team": {"$eq": "growth"}}, namespace="default")
For full re-embedding, upsert with the same id and the new vector. There is no migration step.
Wiring it into a RAG endpoint
def retrieve(q: str, k: int = 4) -> list[str]:
v = oai.embeddings.create(model="text-embedding-3-small", input=q).data[0].embedding
r = index.query(vector=v, top_k=k, namespace="default", include_metadata=True)
return [m.metadata["text"] for m in r.matches]
def answer(q: str) -> str:
ctx = "\n".join(retrieve(q))
msg = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Answer using the context. If unknown say so."},
{"role": "user", "content": f"Context:\n{ctx}\n\nQ: {q}"},
],
)
return msg.choices[0].message.content
print(answer("Which vector DB lives inside Postgres?"))
Front this with FastAPI; see What is FastAPI.
Tuning that matters
Use a smaller embedding when latency rules, a larger one when quality rules. Set top_k based on how much context your prompt budget allows. Chunk source documents to roughly 200 to 500 tokens. Re-rank the top 20 with a stronger model if precision matters.
Wrap up
Pinecone is boring in the best way. Pick a dimension, upsert in batches, query with filters, isolate tenants with namespaces, and let the managed plane handle the rest.