FastAPI Deployment with Uvicorn and Gunicorn
Deploy FastAPI to production with Gunicorn managing Uvicorn workers. Cover process counts, timeouts, and health checks.
What you'll learn
- ✓ASGI vs WSGI
- ✓Why combine Gunicorn and Uvicorn
- ✓Choosing worker counts
- ✓Graceful shutdown
- ✓Health checks and probes
Prerequisites
- •Comfortable running Python apps on Linux
What and Why
In development you run uvicorn app:app --reload and life is good. In production you need something that supervises workers, restarts crashed ones, handles signals correctly, and rolls out new code without dropping requests. The most common production setup for FastAPI is Gunicorn as a process manager with Uvicorn workers handling the ASGI traffic.
Uvicorn is the ASGI server that actually speaks HTTP. Gunicorn is a battle-tested process manager. Putting them together gives you Gunicorn’s operational maturity with Uvicorn’s async performance.
Mental Model
Picture three layers. The load balancer terminates TLS and routes traffic to your app servers. Gunicorn runs as the master process on each app server, supervising N Uvicorn worker processes. Each worker is a Python process running your FastAPI app inside an event loop, handling many concurrent requests on a single thread.
The master process does no request work. Its job is to fork workers, restart them if they die, and forward signals during deploys. Workers do not share memory after fork, so any in-process cache only exists per worker.
Hands-on Example
A typical production command.
gunicorn app.main:app \
--worker-class uvicorn.workers.UvicornWorker \
--workers 4 \
--bind 0.0.0.0:8000 \
--timeout 30 \
--graceful-timeout 20 \
--keep-alive 5 \
--access-logfile - \
--error-logfile -
Drop a Dockerfile in place.
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["gunicorn", "app.main:app", \
"--worker-class", "uvicorn.workers.UvicornWorker", \
"--workers", "4", \
"--bind", "0.0.0.0:8000", \
"--timeout", "30"]
Add a health endpoint that proves the worker can serve traffic.
from fastapi import FastAPI
app = FastAPI()
@app.get("/healthz")
def healthz():
return {"status": "ok"}
@app.get("/readyz")
async def readyz():
# Confirm critical dependencies before reporting ready
await db.execute("SELECT 1")
return {"status": "ready"}
Load balancer -> Host
|
v
Gunicorn master
| | | |
v v v v
W1 W2 W3 W4 (Uvicorn workers, async event loops)
|
v
Postgres / Redis / external APIs A reasonable worker count is 2 * cores + 1 for typical IO-bound APIs. CPU-bound workloads benefit from fewer workers; long-running tasks belong in Celery, not the request path.
Common Pitfalls
- Setting
--reloadin production. It is a development feature that disables process safety and watches the filesystem. - Long default timeouts. The Gunicorn default is 30s but workers stuck on a slow third-party call will block their slot. Pair short timeouts with proper async clients and circuit breakers.
- In-process caches without invalidation. With four workers, each gets its own cache. Use Redis for shared state.
- Ignoring graceful shutdown. SIGTERM should drain in-flight requests.
--graceful-timeoutcontrols how long Gunicorn waits before killing stuck workers. - Forgetting to set
--forwarded-allow-ips. Behind a proxy, the real client IP comes throughX-Forwarded-For. Without configuration, Uvicorn ignores it.
Practical Tips
- Liveness vs readiness.
/healthzis liveness (process is alive)./readyzis readiness (dependencies are reachable). Kubernetes uses both differently. - Log to stdout and let the platform aggregate. Files on container disks vanish when pods die.
- Use a
WORKERS=$(nproc)style env var to keep your image portable across instance sizes. - For long polls or WebSockets, raise
--keep-aliveand remember to use sticky sessions or a backplane. - Enable structured logging with
uvicorn’s--log-configand emit JSON for easier aggregation.
Wrap-up
Production-grade FastAPI is mostly about choosing the right wrapper. Gunicorn supervises, Uvicorn handles ASGI, and your app focuses on business logic. Tune workers to your CPU count, set conservative timeouts, expose proper health endpoints, and lean on the platform for restarts and rollouts. With those pieces in place, you have a deployment story that scales horizontally and recovers automatically when individual workers misbehave.
Related articles
- FastAPI FastAPI Authentication with JWT
Implement JWT-based authentication in FastAPI with OAuth2 password flow, secure token signing, and a reusable get_current_user dependency.
- FastAPI FastAPI CORS: A Practical Tutorial
Configure CORS in FastAPI without security holes: how the browser preflight works, which origins and headers to allow, credentials and cookies, and the most common misconfigurations to avoid.
- FastAPI FastAPI Middleware Tutorial
Learn how FastAPI middleware works under the hood and write your own for logging, timing, and request enrichment.
- FastAPI FastAPI OpenAPI Customization: A Practical Tutorial
Tailor FastAPI's auto-generated OpenAPI schema: tags, summaries, examples, response models, custom operation IDs, security schemes, and a custom Swagger UI your team will actually use.