A video streaming platform is one of the hardest system design problems because it touches every layer of the stack: a multi-stage transcoding pipeline, a globally distributed CDN, adaptive bitrate delivery, and petabyte-scale object storage. The core challenge is getting video from upload to playback in multiple resolutions with sub-200ms startup latency for 500M daily users.
Practice this design with AI
Get coached through each section in a mock interview setting
The core product is simple: users upload videos, other users watch them. The hard part is doing this at YouTube/Netflix scale with a smooth playback experience worldwide.
Start with DAU and work down. 50M DAU is a reasonable mid-scale estimate (YouTube is 2B+ monthly, but scope it to what you can reason about).
Keep the API surface small. Video upload uses multipart; everything else is JSON over REST.
/api/v1/videos`/api/v1/videos/{video_id}`/api/v1/videos/search?q=system+design&page=1&limit=20`/api/v1/videos/{video_id}/interactionsAll endpoints use Bearer token auth. Rate limit headers (X-RateLimit-Remaining, X-RateLimit-Reset) on every response.

High-Level Architecture
The system splits cleanly into two paths: the upload/transcode pipeline (write) and the viewing/streaming pipeline (read). They share a metadata store but are otherwise independent.
1. Client uploads video to an API gateway behind an L7 load balancer. 2. The API server validates the request, stores the raw file in S3 (object storage), writes metadata to PostgreSQL with status='processing', and drops a message onto a Kafka topic. 3. The Transcoding Service picks up the Kafka message. It pulls the raw file from S3, runs FFmpeg to produce HLS segments at multiple bitrates (360p, 720p, 1080p, 4K), generates thumbnails, and writes everything back to S3. 4. On completion, it updates the metadata row to status='available' and publishes a CDN invalidation so edge caches can start pulling the new content.
1. Client calls GET /api/v1/videos/{id}. The API server checks Redis for cached metadata. On cache miss, it reads from a PostgreSQL read replica and populates the cache. 2. The response includes a CDN manifest URL (e.g., master.m3u8 for HLS). 3. The video player fetches the manifest from the CDN edge. The edge either serves cached segments or pulls from the S3 origin. 4. The player uses adaptive bitrate logic - it starts at a conservative quality and ramps up based on measured throughput.

Detailed Component Design
Three components deserve deep attention: the transcoding pipeline, the CDN layer, and the metadata store.
Transcoding Pipeline
This is the most compute-intensive part of the system. A single 1080p video takes 2-10 minutes to transcode depending on length and codec.
Architecture: Kafka topic → worker pods (Kubernetes) → FFmpeg → S3. Each worker pulls one job, downloads the raw file, transcodes it into all target renditions, uploads segments to S3, and acknowledges the Kafka offset.
Why Kafka over SQS or RabbitMQ: Kafka gives you ordered processing per partition, replay capability (critical when a bad FFmpeg config corrupts outputs and you need to reprocess), and natural backpressure. If workers fall behind, messages queue up - you scale workers, not redesign the system.
CDN Layer
The CDN handles 95%+ of all read traffic. Without it, this system doesn't work at scale.
Architecture: Multi-tier - edge PoPs (200+ locations) → regional mid-tier caches → S3 origin. A request misses at the edge, hits the regional cache, and only falls through to origin as a last resort.
Why multi-tier matters: A single-tier CDN means every cache miss hits S3 directly. With 870K segment requests/sec at peak and even a 5% miss rate, that's 43K requests/sec to S3. A regional mid-tier absorbs most of those misses, reducing origin load to <1% of total traffic.
Cache strategy: Set TTL to 7 days for video segments (immutable content, never changes). Set TTL to 5 minutes for manifests (may update when new renditions become available). Use consistent hashing to route segment requests to specific edge servers - this maximizes cache hit rate.
Metadata Store (PostgreSQL)
Use PostgreSQL, not Cassandra, for metadata. The data is relational (videos belong to users, comments reference videos), the write volume is low (~4 QPS peak for uploads), and you need ACID for things like updating view counts without losing writes.
Scale reads with 3-5 read replicas behind PgBouncer. Cache hot rows in Redis with a 60-second TTL.
For the videos table, partition by upload_date (monthly). This keeps recent queries fast (users mostly browse recent content) and lets you archive old partitions to cheaper storage.

Data Model & Database Design
Use PostgreSQL for structured metadata and S3 for binary content. Don't overthink this - a relational DB handles the query patterns perfectly at this write volume.
sql CREATE TABLE videos ( video_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(user_id), title VARCHAR(255) NOT NULL, description TEXT, category VARCHAR(100), tags TEXT[], s3_raw_path VARCHAR(500) NOT NULL, status VARCHAR(20) DEFAULT 'processing', -- processing | available | failed duration_sec INT, view_count BIGINT DEFAULT 0, like_count INT DEFAULT 0, created_at TIMESTAMPTZ DEFAULT now() ) PARTITION BY RANGE (created_at); `` Partition monthly. Index on (user_id), (status), and GIN index on (tags) for tag-based queries.
sql CREATE TABLE users ( user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), username VARCHAR(100) UNIQUE NOT NULL, email VARCHAR(255) UNIQUE NOT NULL, password_hash VARCHAR(255) NOT NULL, subscriber_count INT DEFAULT 0, created_at TIMESTAMPTZ DEFAULT now() );
sql CREATE TABLE video_renditions ( rendition_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), video_id UUID NOT NULL REFERENCES videos(video_id), resolution VARCHAR(10) NOT NULL, -- 360p, 720p, 1080p, 4k bitrate_kbps INT NOT NULL, codec VARCHAR(20) NOT NULL, -- h264, h265 manifest_url VARCHAR(500) NOT NULL, segment_count INT NOT NULL, created_at TIMESTAMPTZ DEFAULT now() ); `` Index on (video_id). This table lets you query what renditions exist for a video and build the master manifest.
Why not Cassandra for metadata: The write volume doesn't justify it. PostgreSQL with read replicas handles 4 writes/sec and 3K reads/sec trivially. Cassandra adds operational complexity (compaction tuning, consistency level headaches, no joins) for zero benefit at this scale.

Deep Dives
Deep Dive 1: Adaptive Bitrate Streaming - Making Playback Smooth
The problem: Users have wildly different network conditions. Someone on fiber should get 4K; someone on a subway should get 360p without buffering. Static quality selection doesn't work.
Approach: Encode every video into 4-6 renditions. Serve an HLS manifest listing all renditions with their bandwidth requirements. The client player measures its download speed every few segments and switches renditions accordingly.
The subtlety is in chunk boundaries. Every rendition must have identical segment boundaries (same GOP structure) so the player can switch mid-stream without visual artifacts. This means your FFmpeg command forces keyframe intervals at exact chunk boundaries: -force_key_frames "expr:gte(t,n_forced*6)" for 6-second segments.
Tradeoff: More renditions = smoother adaptation but more storage and encoding time. 6 renditions (240p, 360p, 480p, 720p, 1080p, 4K) is the standard Netflix/YouTube approach. Diminishing returns beyond that.
Deep Dive 2: Scaling Transcoding to Handle Upload Spikes
The problem: A viral creator uploads a video and 10K other creators upload simultaneously. Your transcoding queue backs up, and videos take hours to become available.
Approach: Kubernetes HPA (Horizontal Pod Autoscaler) scales worker pods based on Kafka consumer lag. When lag exceeds a threshold (e.g., >1000 unprocessed messages), spin up more workers. Scale down when lag drops.
But HPA alone isn't enough at 100x scale. Add priority queues - verified creators and trending channels get priority transcoding. Implement this with separate Kafka topics (high-priority, normal, bulk) and dedicated consumer groups for each.
For extreme scale, consider splitting transcoding by rendition. Instead of one worker producing all renditions, have separate workers for each resolution. This parallelizes the work per video from ~10 minutes (sequential) to ~3 minutes (parallel). The trade-off is more S3 reads (each worker downloads the raw file independently), but S3 bandwidth is cheap.
Deep Dive 3: CDN Cache Invalidation and Cold-Start
The problem: When a new video goes live, no CDN edge has it cached. The first viewers in each region all hit origin simultaneously - a thundering herd problem.
Approach: Pre-warm the CDN. After transcoding completes, push the first few segments (covering the first 30 seconds of playback) to all major edge PoPs proactively. Use the CDN's push/preload API (CloudFront has this, Akamai has Netstorage).
For cache invalidation (e.g., creator deletes a video or replaces it), use the CDN's purge API. But purge is slow (can take 5-30 seconds across all edges). For immediate effect, change the manifest URL with a version parameter (e.g., ?v=2) so the old cached version is simply never requested again.
Tradeoff: Pre-warming costs bandwidth and storage at every edge. Only pre-warm for videos predicted to get high traffic (verified creators, scheduled premieres). For long-tail content, accept the cold-start penalty - it affects <1% of viewers.
Key Trade-offs:
Our AI interviewer will test your understanding with follow-up questions
Start Mock Interview