A notification system is deceptively simple on the surface - accept a message, deliver it to a user - but at scale it becomes a distributed priority queue with multi-channel fan-out, preference filtering, rate limiting, and delivery tracking. The real challenge is handling 10M+ users across push, email, and SMS without spamming anyone or dropping critical alerts.
Practice this design with AI
Get coached through each section in a mock interview setting
A notification system is the plumbing behind every "ding" on your phone. It sounds simple until you realize you need to deliver millions of messages per hour across three different channels, respect user preferences, avoid spamming, track delivery, and handle partial failures gracefully.
Start with 10M DAU and 5 notifications per user per day.
This API is internal - consumed by other backend services, not end users. Keep it simple and opinionated.
/api/v1/notifications``/api/v1/notifications/{notification_id}`/api/v1/users/{user_id}/notifications?page=1&limit=20`/api/v1/users/{user_id}/preferences`No delete endpoint for notifications - they're immutable audit records. Mark as read via PATCH /api/v1/notifications/{id} with {"read": true}.

High-Level Architecture
The system is a pipeline: intake → filtering → routing → delivery ? tracking. Each stage is decoupled by Kafka so they scale independently.
1. A backend service (e.g., the messaging service) calls POST /api/v1/notifications with a template ID, user IDs, and variables.
2. The Notification API Server validates the request, resolves the template, and publishes one message per user to a Kafka topic (partitioned by user_id for ordering guarantees per user).
3. The Preference Filter consumer reads from Kafka, checks the user's preferences in Redis ("does this user want push for new_message notifications?"), and drops the notification if the user opted out. Surviving notifications are published to channel-specific Kafka topics: push-notifications, email-notifications, sms-notifications.
5. Each delivery worker writes a delivery receipt back to a Kafka tracking topic. The Tracking Service consumes these and writes to Cassandra (notification log) and updates Redis counters.

Detailed Component Design
Three components carry the complexity: the preference filtering and rate limiting layer, the push delivery service, and the priority queue system.
Preference Filter + Rate Limiter
This is the gatekeeper. Every notification passes through it before reaching any delivery channel.
Preference check: Look up user preferences in Redis (hash per user: user:{id}:prefs). If the key is missing, fetch from PostgreSQL and populate the cache with a 10-minute TTL. The check is simple - does this user want this notification type on this channel? If not, drop it and log the filter reason.
Rate limiting is more nuanced. Use a sliding window counter in Redis (INCR + EXPIRE). Rules:
Why rate limiting matters: Without it, a bug in an upstream service can send 1000 notifications to every user in seconds. This has happened at real companies (GitHub's notification storm in 2019). The rate limiter is your safety net.
Push Delivery Service
The hardest channel because APNs and FCM have connection limits, token expiry, and platform-specific quirks.
Architecture: A pool of worker processes, each maintaining persistent HTTP/2 connections to APNs and FCM. Use connection pooling - APNs performs best with ~20 concurrent connections per server. More connections don't increase throughput and may trigger throttling.
Device token management: Users have multiple devices. Store all active tokens in Redis as a set (user:{id}:tokens). When APNs returns a "token invalid" response, remove it immediately - sending to invalid tokens degrades your APNs reputation and they'll throttle you.
Batching: FCM supports sending to up to 500 tokens in a single multicast request. Use this aggressively. Batch notifications headed to the same topic/data payload and send as multicast.
Failure handling: On transient failure (5xx from APNs/FCM), retry with exponential backoff (1s, 2s, 4s, max 30s). After 5 retries, write to a dead-letter topic. On permanent failure (invalid token, unregistered device), don't retry - update the token store.
Priority Queue System
Not all notifications are equal. A 2FA code must arrive in <500ms. A marketing email can wait 30 seconds.
Implement priority with separate Kafka topics per priority level: critical, standard, bulk. Each has its own consumer group. Critical consumers get dedicated resources and are never starved. Standard consumers scale based on lag. Bulk consumers run at lower throughput and pause entirely if the system is under load.
This is simpler and more reliable than trying to implement priority within a single topic. Kafka doesn't natively support message priority - fighting against the tool's design is a mistake.

Data Model & Database Design
Two databases: PostgreSQL for structured, low-write data; Cassandra for the high-volume notification log.
```sql CREATE TABLE users ( user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email VARCHAR(255) UNIQUE NOT NULL, phone VARCHAR(20), timezone VARCHAR(50) DEFAULT 'UTC', created_at TIMESTAMPTZ DEFAULT now() );
CREATE TABLE devices ( device_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(user_id), platform VARCHAR(10) NOT NULL, -- ios, android, web device_token VARCHAR(512) NOT NULL, is_active BOOLEAN DEFAULT true, updated_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_devices_user ON devices(user_id) WHERE is_active = true;
CREATE TABLE notification_templates ( template_id VARCHAR(100) PRIMARY KEY, channel VARCHAR(10) NOT NULL, -- push, email, sms title_tmpl TEXT NOT NULL, body_tmpl TEXT NOT NULL, version INT DEFAULT 1 );
CREATE TABLE user_preferences ( user_id UUID REFERENCES users(user_id), notification_type VARCHAR(50), push_enabled BOOLEAN DEFAULT true, email_enabled BOOLEAN DEFAULT true, sms_enabled BOOLEAN DEFAULT false, PRIMARY KEY (user_id, notification_type) ); ```
Why a partial index on devices: Most queries are "get active tokens for user X." The partial index (WHERE is_active = true) keeps the index small and fast. Inactive tokens are dead weight.
CREATE TABLE notifications ( user_id UUID, created_at TIMESTAMP, notification_id UUID, template_id TEXT, channel TEXT, status TEXT, -- queued, sent, delivered, failed priority TEXT, delivered_at TIMESTAMP, PRIMARY KEY ((user_id), created_at, notification_id) ) WITH CLUSTERING ORDER BY (created_at DESC, notification_id ASC);
Partition key is user_id. This means all notifications for a user live on the same Cassandra node, making "get my last 20 notifications" a single-partition query - the fastest possible Cassandra read.
Clustering by created_at DESC gives you reverse chronological order natively. No sorting needed at query time.
Why Cassandra over PostgreSQL for this table: At 50M writes/day with 90-day retention, you're looking at 4.5B rows. Cassandra handles this with linear horizontal scaling. PostgreSQL would need aggressive partitioning and vacuuming. Cassandra's TTL feature also makes retention trivial: INSERT ... USING TTL 7776000 (90 days) and rows auto-delete.

Deep Dives
Deep Dive 1: Fan-out to Millions of Users
The problem: A marketing team wants to send a notification to all 10M users. Naively iterating through users and publishing 10M Kafka messages from a single API call will take forever and probably OOM the API server.
Approach: Two-phase fan-out.
This spreads the work across multiple workers and keeps the API call fast (~50ms to queue the broadcast). The fan-out itself takes minutes for 10M users, which is fine for marketing notifications.
Tradeoff: This adds a processing delay for bulk notifications (minutes vs. seconds). Critical notifications bypass the fan-out system entirely - they're always per-user from the start.
Deep Dive 2: Exactly-Once Delivery (or as Close as Possible)
The problem: Network blips, Kafka rebalances, and worker restarts can cause the same notification to be processed twice. Sending duplicate push notifications is a terrible user experience.
Approach: Deduplication at multiple layers.
This gives you effectively-once delivery. True exactly-once across external systems (APNs, FCM) is impossible - you can't roll back a push notification. But the dedup layers make duplicates extremely rare.
Tradeoff: The Redis dedup adds ~1ms latency per notification and uses ~500MB of memory for the dedup keys. Worth it.
Deep Dive 3: Handling APNs/FCM Outages Gracefully
The problem: APNs goes down for 30 minutes (this happens). You have millions of queued push notifications. What do you do?
Approach: Circuit breaker pattern.
Why not retry immediately: If APNs is overloaded or down, hammering it with retries makes things worse for everyone. The circuit breaker gives the external service time to recover while keeping your system stable.
Tradeoff: Notifications are delayed during the outage. For critical notifications (2FA codes), maintain a fallback path: if push fails, automatically escalate to SMS.
Key Trade-offs:
Our AI interviewer will test your understanding with follow-up questions
Start Mock Interview