Skip to content
C Codeloom
System Design

System Design: Design a Video Streaming Service (YouTube/Netflix)

Design a video streaming platform like YouTube or Netflix. Covers upload pipelines, transcoding, adaptive bitrate, CDN delivery, metadata storage, and recommendations briefly.

·5 min read · By Yash Kesharwani
Intermediate 12 min read

What you'll learn

  • Build an upload and transcoding pipeline
  • Serve adaptive bitrate streams via CDN
  • Model video metadata and view events
  • Scale playback to hundreds of millions of users
  • Frame VOD vs live as different problems

Prerequisites

  • Familiarity with HTTP, object storage, and queues.
  • Comfort with CDN basics. See [What is AWS](/blog/what-is-aws).

YouTube and Netflix move petabytes of bytes per day. The system is not really about video — it is about object storage, transcoding pipelines, CDN egress, and metadata at scale. This design focuses on video-on-demand. Live streaming gets a paragraph at the end.

Functional Requirements

  • Upload a video.
  • Transcode to multiple resolutions and codecs.
  • Stream playback with adaptive bitrate.
  • Browse, search, and watch.
  • Track views, likes, watch time.

Non-Functional Requirements

  • 2B daily active users, 500M hours watched per day.
  • Upload QPS: low (5k per second) but per-upload payload is large.
  • Read QPS for metadata: hundreds of thousands per second.
  • Playback start time: under 1 second.
  • Storage: tens of exabytes lifetime.
  • Availability: 99.99 percent for playback.

High-Level Architecture

  • Upload service: accepts chunked uploads, writes raw video to object storage.
  • Transcoding pipeline: a queue of jobs that fan out to a worker pool, producing rendition files per resolution and codec.
  • Manifest generator: writes HLS or DASH manifests pointing at the renditions.
  • Metadata service: stores title, description, thumbnails, tags, owner.
  • Search service: an inverted index over titles, descriptions, transcripts.
  • Recommendation service: ranks candidates per user.
  • CDN: serves segments and manifests close to the user.
  • Analytics: ingests view events, watch time, quality of experience.

Data Model

CREATE TABLE videos (
  video_id     BIGINT PRIMARY KEY,
  owner_id     BIGINT,
  title        TEXT,
  description  TEXT,
  duration_ms  INT,
  status       VARCHAR(16),
  created_at   TIMESTAMP
);

CREATE TABLE renditions (
  video_id     BIGINT,
  resolution   VARCHAR(8),
  codec        VARCHAR(16),
  bitrate_kbps INT,
  url          TEXT,
  PRIMARY KEY (video_id, resolution, codec)
);

View events go to a separate analytics store, not the metadata database.

Key APIs

POST /api/v1/uploads
  returns: upload_id, signed_url, chunk_size

PUT  /api/v1/uploads/:id/chunks/:n
POST /api/v1/uploads/:id/complete

GET  /api/v1/videos/:id
GET  /api/v1/videos/:id/manifest.m3u8
GET  /api/v1/videos/:id/segments/:rendition/:n.ts

POST /api/v1/videos/:id/views
  body: position_ms, quality, session_id

Segment URLs are served directly from the CDN. The app server never proxies bytes.

Upload and Transcoding Pipeline

  1. Client requests an upload URL. Service returns a pre-signed URL into object storage.
  2. Client uploads in chunks directly to object storage.
  3. On completion, an event is emitted to the transcoding queue.
  4. Workers pull jobs, transcode to N renditions (240p through 4K) in multiple codecs (H.264, H.265, AV1), and write outputs back to object storage.
  5. Manifest is generated and the video status flips from processing to ready.
  6. Thumbnails are extracted in parallel.

Transcoding is embarrassingly parallel. Split a video into N-second chunks, transcode each chunk on its own worker, concatenate. This drops a 1-hour transcode from 30 minutes to a few minutes.

Workers run on GPU-accelerated nodes or CPU pools. Spot instances are common — see What is AWS for the pricing model that makes this work.

Playback

Adaptive bitrate is the entire game. The client downloads a manifest, picks a rendition based on measured throughput, and downloads 2- to 10-second segments. If bandwidth drops, the client switches to a lower rendition on the next segment boundary.

HLS (Apple) and DASH (everyone else) are the two formats. They are conceptually identical: a manifest pointing at a sequence of segment files.

Scaling and Tradeoffs

CDN. This is where 99 percent of the bytes go. Use a multi-CDN strategy with origin shielding so the storage tier sees almost no traffic. For the long tail of unpopular videos, the origin will be hit more often — keep the origin behind a regional cache.

Storage tiering. Hot videos in standard object storage. Warm in infrequent-access. Cold in archival storage. Move based on age and views per day. The savings are enormous.

Metadata sharding. Shard videos by video_id. Cache hot rows in Redis. See SQL Indexes and Performance for indexing strategies on the metadata side.

Codecs. AV1 saves 30 percent bandwidth over H.264 but costs 5 to 10x to encode. Encode AV1 for the popular fraction of the catalog where bandwidth savings pay back.

View counting. Never increment a counter row per view. Stream events to a queue (see the message queue article), aggregate in batches, write rolled-up counts.

Recommendations. Two-stage: a candidate generator returns a few hundred videos, a ranker orders them. The serving path is cached per user with a short TTL.

Live streaming. Same architecture but tighter: ingest via RTMP or SRT, transcode to renditions in real time, push to CDN with low-latency HLS or CMAF chunked transfer. End-to-end target is 3 to 10 seconds.

Search. An inverted index (Elasticsearch or similar) over title, description, transcript. Transcripts come from speech-to-text run during transcoding.

What to Say in an Interview

  • State upfront that the system is dominated by storage and CDN egress, not by request QPS.
  • Cover the chunked transcoding pipeline. It is the most interesting backend piece.
  • Explain adaptive bitrate from the client side. Many candidates skip this and lose easy points.
  • Tier storage by popularity. Mention archival storage for old content.
  • Separate VOD from live and say which one you are designing. Mixing them is the most common mistake.

Wrap up

A video service is an upload pipeline, a transcoding fleet, an object store full of segments, a CDN, and a metadata database. Move bytes once into storage, serve them through a CDN, never proxy through your app, and shove view events through a queue. Build that and you have YouTube; add a recommendation system and you have YouTube that people actually use.