FastAPI Streaming Responses Tutorial
Stream large files, generated text, and Server-Sent Events from FastAPI without loading everything into memory.
What you'll learn
- ✓When to stream instead of buffer
- ✓StreamingResponse vs FileResponse
- ✓Implementing Server-Sent Events
- ✓Streaming LLM-like token output
- ✓Backpressure and timeouts
Prerequisites
- •Comfortable with FastAPI routes and Python generators
What and Why
Some responses are too big to keep in memory or take too long to compute up front. Large CSV exports, video transcoding, AI token streams, and live event feeds all benefit from streaming. The client gets bytes as soon as they are available, and your server avoids ballooning memory usage.
Mental Model
A streaming response is a generator that yields chunks. FastAPI keeps the connection open and writes each chunk to the wire. The HTTP layer uses chunked transfer encoding under the hood.
Buffered:
[ Build full body ] -> [ Send all at once ] -> Client
Streaming:
[ chunk1 ] -> Client
[ chunk2 ] -> Client
[ chunk3 ] -> Client
[ done ] -> Client closes Hands-on Example
Stream a generated CSV.
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import csv, io
app = FastAPI()
def csv_rows(rows):
buf = io.StringIO()
writer = csv.writer(buf)
writer.writerow(["id", "name", "score"])
yield buf.getvalue(); buf.seek(0); buf.truncate(0)
for r in rows:
writer.writerow(r)
yield buf.getvalue()
buf.seek(0); buf.truncate(0)
@app.get("/export.csv")
def export():
rows = ((i, f"user{i}", i * 10) for i in range(1, 100_000))
return StreamingResponse(
csv_rows(rows),
media_type="text/csv",
headers={"Content-Disposition": "attachment; filename=export.csv"},
)
Stream a static file efficiently.
from fastapi.responses import FileResponse
@app.get("/download/{name}")
def download(name: str):
return FileResponse(path=f"/data/{name}", filename=name)
Server-Sent Events for live updates.
import asyncio
from fastapi.responses import StreamingResponse
async def event_stream():
for i in range(10):
yield f"event: tick\ndata: {i}\n\n"
await asyncio.sleep(1)
@app.get("/sse")
def sse():
return StreamingResponse(event_stream(), media_type="text/event-stream")
Token streaming, like an LLM response.
async def token_stream(prompt: str):
for word in f"reply for: {prompt}".split():
yield word + " "
await asyncio.sleep(0.05)
@app.get("/chat")
async def chat(q: str):
return StreamingResponse(token_stream(q), media_type="text/plain")
Client side, the browser can read SSE with EventSource.
const es = new EventSource("/sse");
es.addEventListener("tick", (e) => console.log("tick", e.data));
Common Pitfalls
- Returning a list or string instead of a generator. The point of streaming is incremental output.
- Holding a database connection open for the entire stream. Long streams can exhaust the pool.
- Forgetting to flush on SSE. Each event must end with a blank line.
- Proxies that buffer. Nginx by default buffers responses. Disable with
proxy_buffering offfor streaming endpoints. - Not handling client disconnects. Check
await request.is_disconnected()to stop early.
Practical Tips
- For binary downloads, prefer
FileResponsewhich usessendfilewhen available. - Wrap heavy CPU work in
asyncio.to_threadso it does not block the event loop while streaming. - Set a reasonable chunk size. Many small writes cost syscalls; very large writes increase latency to first byte.
- Add
Cache-Control: no-cacheandX-Accel-Buffering: nofor SSE to disable proxy buffering. - Test with
curl --no-bufferso you can see chunks arrive in real time.
Wrap-up
Streaming lets you serve big or slow responses without melting your server. Reach for StreamingResponse with a generator for dynamic data, FileResponse for files, and the SSE format for push updates. Mind your database lifetime and proxy buffering, and the experience for users will feel snappy even when the response is huge.
Related articles
- FastAPI FastAPI WebSockets Tutorial
Build real-time features with FastAPI WebSockets. Manage connections, broadcast messages, and handle disconnects cleanly.
- FastAPI FastAPI: Async Routes and Dependency Injection
A practical guide to async path operations and Depends() in FastAPI — when async actually helps, per-request DB sessions, auth dependencies, and how sub-dependencies compose.
- FastAPI FastAPI Authentication with JWT
Implement JWT-based authentication in FastAPI with OAuth2 password flow, secure token signing, and a reusable get_current_user dependency.
- FastAPI FastAPI CORS: A Practical Tutorial
Configure CORS in FastAPI without security holes: how the browser preflight works, which origins and headers to allow, credentials and cookies, and the most common misconfigurations to avoid.