FastAPI Streaming Responses Tutorial

Intermediate 9 min read

What you'll learn

✓When to stream instead of buffer
✓StreamingResponse vs FileResponse
✓Implementing Server-Sent Events
✓Streaming LLM-like token output
✓Backpressure and timeouts

Prerequisites

•Comfortable with FastAPI routes and Python generators

What and Why

Some responses are too big to keep in memory or take too long to compute up front. Large CSV exports, video transcoding, AI token streams, and live event feeds all benefit from streaming. The client gets bytes as soon as they are available, and your server avoids ballooning memory usage.

Mental Model

A streaming response is a generator that yields chunks. FastAPI keeps the connection open and writes each chunk to the wire. The HTTP layer uses chunked transfer encoding under the hood.

Buffered:
[ Build full body ] -> [ Send all at once ] -> Client

Streaming:
[ chunk1 ] -> Client
[ chunk2 ] -> Client
[ chunk3 ] -> Client
[ done ]   -> Client closes

Streaming vs buffered responses

Hands-on Example

Stream a generated CSV.

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import csv, io

app = FastAPI()

def csv_rows(rows):
    buf = io.StringIO()
    writer = csv.writer(buf)
    writer.writerow(["id", "name", "score"])
    yield buf.getvalue(); buf.seek(0); buf.truncate(0)

    for r in rows:
        writer.writerow(r)
        yield buf.getvalue()
        buf.seek(0); buf.truncate(0)

@app.get("/export.csv")
def export():
    rows = ((i, f"user{i}", i * 10) for i in range(1, 100_000))
    return StreamingResponse(
        csv_rows(rows),
        media_type="text/csv",
        headers={"Content-Disposition": "attachment; filename=export.csv"},
    )

Stream a static file efficiently.

from fastapi.responses import FileResponse

@app.get("/download/{name}")
def download(name: str):
    return FileResponse(path=f"/data/{name}", filename=name)

Server-Sent Events for live updates.

import asyncio
from fastapi.responses import StreamingResponse

async def event_stream():
    for i in range(10):
        yield f"event: tick\ndata: {i}\n\n"
        await asyncio.sleep(1)

@app.get("/sse")
def sse():
    return StreamingResponse(event_stream(), media_type="text/event-stream")

Token streaming, like an LLM response.

async def token_stream(prompt: str):
    for word in f"reply for: {prompt}".split():
        yield word + " "
        await asyncio.sleep(0.05)

@app.get("/chat")
async def chat(q: str):
    return StreamingResponse(token_stream(q), media_type="text/plain")

Client side, the browser can read SSE with EventSource.

const es = new EventSource("/sse");
es.addEventListener("tick", (e) => console.log("tick", e.data));

Common Pitfalls

Returning a list or string instead of a generator. The point of streaming is incremental output.
Holding a database connection open for the entire stream. Long streams can exhaust the pool.
Forgetting to flush on SSE. Each event must end with a blank line.
Proxies that buffer. Nginx by default buffers responses. Disable with proxy_buffering off for streaming endpoints.
Not handling client disconnects. Check await request.is_disconnected() to stop early.

Practical Tips

For binary downloads, prefer FileResponse which uses sendfile when available.
Wrap heavy CPU work in asyncio.to_thread so it does not block the event loop while streaming.
Set a reasonable chunk size. Many small writes cost syscalls; very large writes increase latency to first byte.
Add Cache-Control: no-cache and X-Accel-Buffering: no for SSE to disable proxy buffering.
Test with curl --no-buffer so you can see chunks arrive in real time.

Wrap-up

Streaming lets you serve big or slow responses without melting your server. Reach for StreamingResponse with a generator for dynamic data, FileResponse for files, and the SSE format for push updates. Mind your database lifetime and proxy buffering, and the experience for users will feel snappy even when the response is huge.