System Design: Design a Video Streaming Service (YouTube/Netflix)
Design a video streaming platform like YouTube or Netflix. Covers upload pipelines, transcoding, adaptive bitrate, CDN delivery, metadata storage, and recommendations briefly.
What you'll learn
- ✓Build an upload and transcoding pipeline
- ✓Serve adaptive bitrate streams via CDN
- ✓Model video metadata and view events
- ✓Scale playback to hundreds of millions of users
- ✓Frame VOD vs live as different problems
Prerequisites
- •Familiarity with HTTP, object storage, and queues.
- •Comfort with CDN basics. See [What is AWS](/blog/what-is-aws).
YouTube and Netflix move petabytes of bytes per day. The system is not really about video — it is about object storage, transcoding pipelines, CDN egress, and metadata at scale. This design focuses on video-on-demand. Live streaming gets a paragraph at the end.
Functional Requirements
- Upload a video.
- Transcode to multiple resolutions and codecs.
- Stream playback with adaptive bitrate.
- Browse, search, and watch.
- Track views, likes, watch time.
Non-Functional Requirements
- 2B daily active users, 500M hours watched per day.
- Upload QPS: low (5k per second) but per-upload payload is large.
- Read QPS for metadata: hundreds of thousands per second.
- Playback start time: under 1 second.
- Storage: tens of exabytes lifetime.
- Availability: 99.99 percent for playback.
High-Level Architecture
- Upload service: accepts chunked uploads, writes raw video to object storage.
- Transcoding pipeline: a queue of jobs that fan out to a worker pool, producing rendition files per resolution and codec.
- Manifest generator: writes HLS or DASH manifests pointing at the renditions.
- Metadata service: stores title, description, thumbnails, tags, owner.
- Search service: an inverted index over titles, descriptions, transcripts.
- Recommendation service: ranks candidates per user.
- CDN: serves segments and manifests close to the user.
- Analytics: ingests view events, watch time, quality of experience.
Data Model
CREATE TABLE videos (
video_id BIGINT PRIMARY KEY,
owner_id BIGINT,
title TEXT,
description TEXT,
duration_ms INT,
status VARCHAR(16),
created_at TIMESTAMP
);
CREATE TABLE renditions (
video_id BIGINT,
resolution VARCHAR(8),
codec VARCHAR(16),
bitrate_kbps INT,
url TEXT,
PRIMARY KEY (video_id, resolution, codec)
);
View events go to a separate analytics store, not the metadata database.
Key APIs
POST /api/v1/uploads
returns: upload_id, signed_url, chunk_size
PUT /api/v1/uploads/:id/chunks/:n
POST /api/v1/uploads/:id/complete
GET /api/v1/videos/:id
GET /api/v1/videos/:id/manifest.m3u8
GET /api/v1/videos/:id/segments/:rendition/:n.ts
POST /api/v1/videos/:id/views
body: position_ms, quality, session_id
Segment URLs are served directly from the CDN. The app server never proxies bytes.
Upload and Transcoding Pipeline
- Client requests an upload URL. Service returns a pre-signed URL into object storage.
- Client uploads in chunks directly to object storage.
- On completion, an event is emitted to the transcoding queue.
- Workers pull jobs, transcode to N renditions (240p through 4K) in multiple codecs (H.264, H.265, AV1), and write outputs back to object storage.
- Manifest is generated and the video status flips from
processingtoready. - Thumbnails are extracted in parallel.
Transcoding is embarrassingly parallel. Split a video into N-second chunks, transcode each chunk on its own worker, concatenate. This drops a 1-hour transcode from 30 minutes to a few minutes.
Workers run on GPU-accelerated nodes or CPU pools. Spot instances are common — see What is AWS for the pricing model that makes this work.
Playback
Adaptive bitrate is the entire game. The client downloads a manifest, picks a rendition based on measured throughput, and downloads 2- to 10-second segments. If bandwidth drops, the client switches to a lower rendition on the next segment boundary.
HLS (Apple) and DASH (everyone else) are the two formats. They are conceptually identical: a manifest pointing at a sequence of segment files.
Scaling and Tradeoffs
CDN. This is where 99 percent of the bytes go. Use a multi-CDN strategy with origin shielding so the storage tier sees almost no traffic. For the long tail of unpopular videos, the origin will be hit more often — keep the origin behind a regional cache.
Storage tiering. Hot videos in standard object storage. Warm in infrequent-access. Cold in archival storage. Move based on age and views per day. The savings are enormous.
Metadata sharding. Shard videos by video_id. Cache hot rows in Redis. See SQL Indexes and Performance for indexing strategies on the metadata side.
Codecs. AV1 saves 30 percent bandwidth over H.264 but costs 5 to 10x to encode. Encode AV1 for the popular fraction of the catalog where bandwidth savings pay back.
View counting. Never increment a counter row per view. Stream events to a queue (see the message queue article), aggregate in batches, write rolled-up counts.
Recommendations. Two-stage: a candidate generator returns a few hundred videos, a ranker orders them. The serving path is cached per user with a short TTL.
Live streaming. Same architecture but tighter: ingest via RTMP or SRT, transcode to renditions in real time, push to CDN with low-latency HLS or CMAF chunked transfer. End-to-end target is 3 to 10 seconds.
Search. An inverted index (Elasticsearch or similar) over title, description, transcript. Transcripts come from speech-to-text run during transcoding.
What to Say in an Interview
- State upfront that the system is dominated by storage and CDN egress, not by request QPS.
- Cover the chunked transcoding pipeline. It is the most interesting backend piece.
- Explain adaptive bitrate from the client side. Many candidates skip this and lose easy points.
- Tier storage by popularity. Mention archival storage for old content.
- Separate VOD from live and say which one you are designing. Mixing them is the most common mistake.
Wrap up
A video service is an upload pipeline, a transcoding fleet, an object store full of segments, a CDN, and a metadata database. Move bytes once into storage, serve them through a CDN, never proxy through your app, and shove view events through a queue. Build that and you have YouTube; add a recommendation system and you have YouTube that people actually use.