FastAPI Deployment with Uvicorn and Gunicorn

Intermediate 10 min read

What you'll learn

✓ASGI vs WSGI
✓Why combine Gunicorn and Uvicorn
✓Choosing worker counts
✓Graceful shutdown
✓Health checks and probes

Prerequisites

•Comfortable running Python apps on Linux

What and Why

In development you run uvicorn app:app --reload and life is good. In production you need something that supervises workers, restarts crashed ones, handles signals correctly, and rolls out new code without dropping requests. The most common production setup for FastAPI is Gunicorn as a process manager with Uvicorn workers handling the ASGI traffic.

Uvicorn is the ASGI server that actually speaks HTTP. Gunicorn is a battle-tested process manager. Putting them together gives you Gunicorn’s operational maturity with Uvicorn’s async performance.

Mental Model

Picture three layers. The load balancer terminates TLS and routes traffic to your app servers. Gunicorn runs as the master process on each app server, supervising N Uvicorn worker processes. Each worker is a Python process running your FastAPI app inside an event loop, handling many concurrent requests on a single thread.

The master process does no request work. Its job is to fork workers, restart them if they die, and forward signals during deploys. Workers do not share memory after fork, so any in-process cache only exists per worker.

Hands-on Example

A typical production command.

gunicorn app.main:app \
  --worker-class uvicorn.workers.UvicornWorker \
  --workers 4 \
  --bind 0.0.0.0:8000 \
  --timeout 30 \
  --graceful-timeout 20 \
  --keep-alive 5 \
  --access-logfile - \
  --error-logfile -

Drop a Dockerfile in place.

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["gunicorn", "app.main:app", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--workers", "4", \
     "--bind", "0.0.0.0:8000", \
     "--timeout", "30"]

Add a health endpoint that proves the worker can serve traffic.

from fastapi import FastAPI

app = FastAPI()

@app.get("/healthz")
def healthz():
    return {"status": "ok"}

@app.get("/readyz")
async def readyz():
    # Confirm critical dependencies before reporting ready
    await db.execute("SELECT 1")
    return {"status": "ready"}

Load balancer -> Host
                |
                v
          Gunicorn master
            |    |    |    |
            v    v    v    v
         W1   W2   W3   W4  (Uvicorn workers, async event loops)
            |
            v
      Postgres / Redis / external APIs

Production process layout

A reasonable worker count is 2 * cores + 1 for typical IO-bound APIs. CPU-bound workloads benefit from fewer workers; long-running tasks belong in Celery, not the request path.

Common Pitfalls

Setting --reload in production. It is a development feature that disables process safety and watches the filesystem.
Long default timeouts. The Gunicorn default is 30s but workers stuck on a slow third-party call will block their slot. Pair short timeouts with proper async clients and circuit breakers.
In-process caches without invalidation. With four workers, each gets its own cache. Use Redis for shared state.
Ignoring graceful shutdown. SIGTERM should drain in-flight requests. --graceful-timeout controls how long Gunicorn waits before killing stuck workers.
Forgetting to set --forwarded-allow-ips. Behind a proxy, the real client IP comes through X-Forwarded-For. Without configuration, Uvicorn ignores it.

Practical Tips

Liveness vs readiness. /healthz is liveness (process is alive). /readyz is readiness (dependencies are reachable). Kubernetes uses both differently.
Log to stdout and let the platform aggregate. Files on container disks vanish when pods die.
Use a WORKERS=$(nproc) style env var to keep your image portable across instance sizes.
For long polls or WebSockets, raise --keep-alive and remember to use sticky sessions or a backplane.
Enable structured logging with uvicorn’s --log-config and emit JSON for easier aggregation.

Wrap-up

Production-grade FastAPI is mostly about choosing the right wrapper. Gunicorn supervises, Uvicorn handles ASGI, and your app focuses on business logic. Tune workers to your CPU count, set conservative timeouts, expose proper health endpoints, and lean on the platform for restarts and rollouts. With those pieces in place, you have a deployment story that scales horizontally and recovers automatically when individual workers misbehave.