FastAPI Rate Limiting: A Practical Tutorial

Intermediate 11 min read

What you'll learn

✓Why rate limiting belongs in your API
✓Fixed window vs sliding window vs token bucket
✓Wiring slowapi into a FastAPI app
✓Per-IP and per-user limits with Redis storage
✓Returning Retry-After and standard headers
✓Pitfalls behind proxies and load balancers

Prerequisites

•Comfort writing FastAPI routes

Rate limiting is one of those features that feels boring until your API gets hammered. Adding it early is much cheaper than retrofitting it after an incident.

What and Why

Rate limiting caps how many requests a single client can make in a window of time. It protects you from abusive clients, runaway loops in your own front-end, and surprise costs from downstream services. It also gives you a clear contract: “100 requests per minute per API key” is easier to support than “be reasonable.”

Mental Model

Three algorithms cover almost every use case. A fixed window counts requests in discrete intervals (the cheap default). A sliding window smooths the edges so a burst at the boundary cannot use two windows at once. A token bucket refills at a steady rate and allows controlled bursts. Most libraries pick one and expose it as a decorator or middleware.

Storage matters too. In-memory counters work on a single process; Redis (or another shared store) is required as soon as you run more than one worker.

Hands-on Example

slowapi is the most popular FastAPI integration. With Redis as backend:

from fastapi import FastAPI, Request, Depends
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.errors import RateLimitExceeded
from slowapi.util import get_remote_address

limiter = Limiter(
    key_func=get_remote_address,
    storage_uri="redis://localhost:6379",
    default_limits=["100/minute"],
)

app = FastAPI()
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)

@app.get("/public")
@limiter.limit("10/minute")
async def public(request: Request):
    return {"ok": True}

def user_key(request: Request) -> str:
    user = request.headers.get("X-API-Key", "")
    return user or get_remote_address(request)

@app.post("/expensive")
@limiter.limit("5/minute", key_func=user_key)
async def expensive(request: Request):
    return {"queued": True}

Each request derives a key (IP or API key), looks up the counter in Redis, increments it, and either passes through or returns 429 with Retry-After.

When a client exceeds its limit, slowapi returns 429 Too Many Requests with a Retry-After header.

Common Pitfalls

The first pitfall is using get_remote_address behind a proxy. Without X-Forwarded-For handling, every request looks like it came from the load balancer, and you rate-limit yourself off the internet. Configure your reverse proxy to pass the client IP and read it with request.client.host only after Uvicorn’s --proxy-headers is enabled.

The second is in-memory storage with multiple workers. Each worker maintains its own counter, so a “10/minute” limit becomes “10 per worker per minute.” Use Redis.

The third is limiting the wrong key. Limiting by IP punishes shared networks (offices, mobile carriers). Prefer API keys or authenticated user IDs where you have them, with IP as a fallback.

The fourth is forgetting to expose limit info. Clients cannot back off intelligently without X-RateLimit-Limit, X-RateLimit-Remaining, and Retry-After headers.

Practical Tips

Set generous defaults globally and tighter limits on expensive routes individually. Whitelist internal services by checking a header or source IP before the limiter. Log 429s so you can spot abusive patterns and legitimate clients who need a higher tier.

For very high traffic, consider doing rate limiting at the edge (NGINX, Cloudflare, or your API gateway) and using FastAPI’s limiter only as a safety net.

Wrap-up

Rate limiting in FastAPI is mostly about choosing a sensible algorithm, picking a stable key per client, and using a shared store from day one. Get the basics right and you have a calmer API, happier users, and a much quieter on-call rotation.