RAG Metadata Filtering Strategies
How to use metadata filters in RAG to improve precision, scope retrieval, and enforce permissions without sacrificing recall.
What you'll learn
- ✓Why metadata filtering matters in RAG
- ✓Common metadata schemas
- ✓Pre-filter versus post-filter strategies
- ✓How to combine filters with vector search
- ✓Pitfalls and how to avoid empty results
Prerequisites
- •Familiar with APIs
- •Basic RAG concepts
What and Why
Vector similarity is great at finding semantically related text but blind to attributes like date, owner, language, or document type. Metadata filtering layers structured constraints on top of semantic search so you can say things like “find chunks similar to this query, but only from the 2025 policy docs in English that this user can access.”
This matters for three reasons: precision, scope, and security. Precision improves because irrelevant matches get pruned. Scope lets users narrow into a project or product. Security ensures a user cannot retrieve documents they should not see, which is a hard requirement in enterprise RAG.
Mental Model
Imagine a library where every book has both a topic and a colored sticker on the spine indicating its language, year, and department. Semantic search picks books by topic. Filters check the sticker. You can either pull a giant pile by topic and then sort by sticker (post-filter) or first walk to the right shelf and only search topics there (pre-filter).
Pre-filtering is faster and more accurate when your filter is selective. Post-filtering is simpler but wastes work and can return too few results if the filter is strict.
Hands-on Example
Most modern vector databases (Pinecone, Weaviate, Qdrant, pgvector with jsonb) support filters in the query. Here is a Qdrant example:
from qdrant_client import QdrantClient
from qdrant_client.http import models as qm
client = QdrantClient("localhost", port=6333)
results = client.search(
collection_name="docs",
query_vector=embed("How do I rotate API keys?"),
query_filter=qm.Filter(
must=[
qm.FieldCondition(key="product", match=qm.MatchValue(value="auth")),
qm.FieldCondition(key="year", range=qm.Range(gte=2024)),
]
),
limit=5,
)
Here product and year are metadata fields stored alongside each vector. The database uses an index on those fields to restrict the search space before doing nearest-neighbor lookup.
Pre-filter:
query ---> [filter by product=auth] ---> [vector search] ---> top-k
Post-filter:
query ---> [vector search top-N] ---> [filter by product=auth] ---> top-k
Hybrid:
query ---> [filter] ---> [vector search] ---> [re-rank] ---> top-k
A common schema is to attach: source, doc_id, chunk_id, created_at, updated_at, author, team, product, language, acl (list of allowed user or group IDs), and maybe section_path.
Trade-offs
Pre-filter is fast when the filter selects a small slice and the database supports indexed filtering. It can hurt recall if you over-constrain. For example, filtering to a single quarter may exclude an evergreen FAQ chunk that is still the best answer.
Post-filter is safe in correctness but wastes compute, since you fetch many vectors only to throw most away. It also risks returning too few results if your over-fetch factor is wrong.
Storing too many metadata fields slows writes and bloats memory. Storing too few limits future filters. A practical middle path is to add only the fields you can imagine filtering on within six months, and keep the rest as raw text.
Permissions are the trickiest area. ACL lists work but explode for documents shared with many groups. Group-based ACLs and role expansion at query time scale better, but require a sync between your identity system and the vector store.
Practical Tips
- Always include
sourceand a stabledoc_idin metadata. You will need it for de-duplication and citations. - For dates, store epoch seconds or ISO strings consistently. Mixed formats break range filters silently.
- Test what happens when filters return zero hits. The pipeline should fall back gracefully rather than handing the LLM nothing.
- Combine filters with hybrid search (BM25 plus vectors) for queries that contain exact identifiers like ticket numbers.
- Log filter usage. You will discover which fields users actually rely on, and which ones can be dropped.
- Push permissions filters as close to the database as possible. Never filter ACLs in application code only.
Wrap-up
Metadata filters turn a generic semantic search into a focused, secure retriever. Decide your schema early, pick pre-filtering when selective and post-filtering when not, and watch out for over-constrained queries that return nothing. Done well, metadata filtering quietly doubles the perceived quality of your RAG system without changing the model or the embeddings.
Related articles
- RAG RAG Chunk Overlap Strategies
Learn how chunk overlap rescues boundary context in RAG pipelines, with practical strategies for choosing overlap size and shape for different corpora.
- RAG RAG Hybrid Search: BM25 + Vectors
Combine lexical BM25 with dense vector search to recover the queries each method misses on its own and boost RAG retrieval quality.
- RAG RAG HyDE: Hypothetical Document Embeddings
Learn how Hypothetical Document Embeddings (HyDE) improve RAG recall by embedding a generated answer instead of the raw query, with examples and trade-offs.
- RAG RAG Reranking Models Overview
Add a reranker on top of vector retrieval to dramatically improve top-k quality with cross-encoders, late interaction, and LLM rerankers.