Knowledge Library

System Design Fundamentals

251+ topics covering architecture, databases, networking, security, and more. Master the concepts, then practice with AI coaching.

End-to-End Walkthroughs

Complete system design solutions with progressive diagrams

Medium45 min

Design a URL Shortener (TinyURL)

A URL shortener is deceptively simple on the surface but forces you to make real decisions about ID generation, caching strategy, and read-heavy scaling. The core challenge: generate globally unique short keys at scale, serve redirects in under 50ms, and handle a 100:1 read-to-write ratio without melting your database.

Google, Amazon, Microsoft
8 sections
Hard50 min

Design a Chat System (WhatsApp / Messenger)

A chat system is fundamentally a real-time message routing problem. The hard parts are maintaining millions of persistent WebSocket connections, guaranteeing message delivery even when recipients are offline, ordering messages correctly in group chats, and keeping presence status accurate across a distributed fleet of servers.

Meta, Google, Microsoft
8 sections
Hard50 min

Design a News Feed (Twitter / Facebook)

A news feed is the central nervous system of any social platform. The core challenge: when a user with 10 million followers posts, how do you get that post into everyone's feed within seconds? This walkthrough covers fan-out strategies, feed ranking, caching at scale, and the hybrid push/pull architecture that powers Twitter and Facebook.

Meta, Twitter/X, LinkedIn
8 sections
Medium40 min

Design a Rate Limiter

A rate limiter is the bouncer at the door of every production API. The real challenge isn't the algorithm - it's making it work across a distributed fleet of servers with sub-5ms latency and graceful failure modes. This walkthrough covers token bucket vs sliding window, Redis-based distributed counting, Lua scripts for atomicity, and the fail-open vs fail-closed decision.

Stripe, Cloudflare, Amazon
8 sections
Hard50 min

Design a Video Streaming Platform (YouTube / Netflix)

A video streaming platform is one of the hardest system design problems because it touches every layer of the stack: a multi-stage transcoding pipeline, a globally distributed CDN, adaptive bitrate delivery, and petabyte-scale object storage. The core challenge is getting video from upload to playback in multiple resolutions with sub-200ms startup latency for 500M daily users.

Netflix, YouTube, Amazon
8 sections
Medium45 min

Design a Notification System

A notification system is deceptively simple on the surface - accept a message, deliver it to a user - but at scale it becomes a distributed priority queue with multi-channel fan-out, preference filtering, rate limiting, and delivery tracking. The real challenge is handling 10M+ users across push, email, and SMS without spamming anyone or dropping critical alerts.

Amazon, Apple, Google
8 sections
Hard45 min

Design a Web Crawler

A web crawler is deceptively simple on the surface - fetch pages, extract links, repeat - but at scale it becomes one of the hardest distributed systems problems. The real challenges are politeness (not DDoSing websites), deduplication (the web is full of duplicate content), and managing a frontier of billions of URLs without losing progress or wasting resources.

Google, Amazon, Microsoft
8 sections
Hard50 min

Design a Distributed Key-Value Store

A distributed key-value store sits at the core of nearly every large-scale system - from DynamoDB to Cassandra to etcd. The design forces you to make hard choices between consistency and availability, pick a partitioning scheme, and build a replication strategy that actually works during network partitions. This walkthrough covers the full design with real tradeoffs, not textbook idealism.

Amazon, Meta, Apple
8 sections

Showing 251 topics

Big Endian vs Little Endian
CS Fundamentals

Big Endian vs Little Endian

Big Endian and Little Endian define the order in which bytes of a multi-byte data type are arranged in computer memory. Mismatched endianness between communicating systems leads to data corruption if not explicitly handled during serialization and deserialization.

Big EndianLittle EndianByte Order
Practice
How do we incorporate Event Sourcing into the systems
Messaging & Communication

How do we incorporate Event Sourcing into the systems

Event Sourcing persists application state as a sequence of immutable events, providing a complete audit trail and enabling temporal queries. This contrasts with traditional CRUD systems that only store the current state, losing historical context and making auditing difficult.

Event StoreEventsEventual Consistency
Practice
How can Cache Systems go wrong
Caching Strategies

How can Cache Systems go wrong

Caches, while designed to accelerate data access, can introduce bottlenecks or inconsistencies if not carefully managed. Understanding common failure modes and mitigation strategies is crucial for building robust, high-performance systems.

Thundering HerdCache PenetrationCache Stampede
Practice
Linux file system explained
CS Fundamentals

Linux file system explained

The Linux file system provides a hierarchical structure for organizing data on storage devices, enabling efficient access and management. Its adherence to standards like the Filesystem Hierarchy Standard (FHS) ensures interoperability and predictable file locations across distributions.

File SystemFilesystem Hierarchy Standard (FHS)ext4
Practice
My recommended materials for cracking your next technical interview
Real-World Architectures

My recommended materials for cracking your next technical interview

Interview preparation demands a multifaceted approach. Success requires mastering coding fundamentals, system design principles, behavioral communication, and realistic simulation, all while understanding the tradeoffs inherent in architectural choices.

Coding practiceSystem designBehavioral interviews
Practice
How Git Commands work
Cloud & DevOps

How Git Commands work

Git commands facilitate the movement and management of code changes between working directories, staging areas, local repositories, and remote repositories. Understanding the underlying mechanisms and trade-offs is crucial for efficient version control and collaboration in software development.

Working directoryStaging areaLocal repository
Practice
Top 4 Most Popular Use Cases for UDP
Networking & Protocols

Top 4 Most Popular Use Cases for UDP

UDP prioritizes speed over reliability, making it suitable for applications where occasional packet loss is tolerable but low latency is crucial. Understanding its common use cases and limitations is essential for designing efficient networking solutions.

UDPTCPRTP
Practice
How Does a Typical Push Notification System Work
Messaging & Communication

How Does a Typical Push Notification System Work

Push notification systems deliver timely updates to users' devices by routing messages through platform-specific gateways, requiring careful consideration of scale, reliability, and user experience. These systems must handle asynchronous delivery, potential message loss, and the need for prioritization while adhering to platform-specific constraints.

Apple Push Notification Service (APNs)Firebase Cloud Messaging (FCM)Device Token
Practice
REST API Cheatsheet
API Design

REST API Cheatsheet

REST APIs provide a standardized architectural style for building networked applications, enabling clients to interact with server resources using a stateless protocol. Effective API design focuses on resource modeling, request handling, and ensuring scalability and security.

HTTP MethodsURIsJSON
Practice
Top 8 Programming Paradigms - Part 1
CS Fundamentals

Top 8 Programming Paradigms - Part 1

Programming paradigms are fundamental styles of building the structure and elements of a computer program. Choosing the right paradigm or combination impacts code organization, maintainability, and suitability for different problem domains.

Imperative ProgrammingDeclarative ProgrammingObject-Oriented Programming
Practice
Data Pipelines Overview
Databases & Storage

Data Pipelines Overview

Data pipelines automate the flow of data from source systems to destinations, enabling analysis and decision-making. They address the challenge of integrating data from disparate sources, transforming it into a usable format, and delivering it reliably to downstream systems.

Data ingestionData transformationData warehouse
Practice
API Vs SDK
API Design

API Vs SDK

APIs define contracts for communication between systems, while SDKs provide comprehensive toolkits for building applications on specific platforms. APIs emphasize interoperability, whereas SDKs facilitate platform-specific development and feature access.

APISDKHTTP
Practice
A handy cheat sheet for the most popular cloud services
Cloud & DevOps

A handy cheat sheet for the most popular cloud services

Cloud services abstract away infrastructure management, offering on-demand compute, storage, and networking. They enable rapid scaling and reduced operational overhead by providing pre-built, managed services accessible via APIs.

Virtual Machines (VMs)AWS EC2Google Compute Engine
Practice
A nice cheat sheet of different monitoring infrastructure in cloud services
Cloud & DevOps

A nice cheat sheet of different monitoring infrastructure in cloud services

Cloud monitoring provides real-time observability into application and infrastructure performance by collecting, storing, analyzing, and visualizing metrics, logs, and traces. Effective monitoring enables proactive issue detection, faster incident response, and data-driven optimization of system reliability and efficiency.

Data CollectionTime-Series DatabasesPrometheus
Practice
REST API Vs. GraphQL
Networking & Protocols

REST API Vs. GraphQL

REST APIs provide pre-defined data structures via multiple endpoints, while GraphQL exposes a single endpoint and allows clients to specify their exact data requirements. Choosing between them involves balancing simplicity, flexibility, and performance considerations.

REST APIGraphQLEndpoint
Practice
Key Use Cases for Load Balancers
CS Fundamentals

Key Use Cases for Load Balancers

Load balancers prevent cascading failures by distributing traffic across healthy backend servers, ensuring high availability and optimal resource utilization. They operate at different layers of the network stack, employing various algorithms to manage traffic flow and maintain session persistence.

Traffic DistributionHigh AvailabilityHealth Monitoring
Practice
Top 6 Firewall Use Cases
Security

Top 6 Firewall Use Cases

Firewalls are critical network security devices that enforce access control policies, preventing unauthorized traffic from entering or leaving a network. They operate by examining network traffic against a defined set of rules, mitigating risks like intrusion, data exfiltration, and malware propagation.

Stateful InspectionDeep Packet Inspection (DPI)IP Address Filtering
Practice
Types of memory. Which ones do you know
CS Fundamentals

Types of memory. Which ones do you know

Computer systems employ a memory hierarchy trading off speed, cost, and capacity. Understanding the characteristics of each level, from CPU registers to remote storage, is crucial for optimizing performance and durability.

RegistersCache (L1, L2, L3)RAM (DRAM)
Practice
How Do C++, Java, Python Work
CS Fundamentals

How Do C++, Java, Python Work

Programming languages employ different execution models- compilation to machine code (C++), interpretation via a virtual machine (Java), or direct interpretation (Python). Each approach presents distinct trade-offs between performance, portability, and development speed, influencing language selection for specific application domains.

CompilerBytecodeJVM
Practice
Top 6 Load Balancing Algorithms
CS Fundamentals

Top 6 Load Balancing Algorithms

Load balancing algorithms distribute network traffic across multiple servers to optimize resource utilization and ensure high availability. The selection of an appropriate algorithm directly impacts performance metrics like latency, throughput, and fairness.

Load BalancingRound RobinWeighted Round Robin
Practice
How does Git work
Cloud & DevOps

How does Git work

Git solves the problem of coordinating changes to files among multiple people, preventing chaos and data loss. It provides a robust system for tracking modifications, reverting to previous states, and merging concurrent efforts into a unified codebase.

CommitSHA-1 hashBranching
Practice
HTTP Cookies Explained With a Simple Diagram
Networking & Protocols

HTTP Cookies Explained With a Simple Diagram

HTTP cookies are a core mechanism for maintaining state in the stateless HTTP protocol, enabling session management and personalized user experiences. They work by storing small pieces of data in a user's browser, which are then sent back to the server with subsequent requests.

HTTP CookiesStateful SessionsSet-Cookie Header
Practice
How does a ChatGPT-like system work
Real-World Architectures

How does a ChatGPT-like system work

ChatGPT-like systems balance massive-scale language modeling with real-time inference and stringent safety constraints. They utilize transformer architectures, reinforcement learning, and content moderation to generate helpful and safe responses to user prompts.

Transformer ArchitectureReinforcement Learning from Human Feedback (RLHF)Tokenization
Practice
A cheat sheet for system designs
Architecture Patterns

A cheat sheet for system designs

System design is a structured approach to defining software architecture, emphasizing trade-offs between competing requirements like scalability, reliability, and cost. It involves understanding constraints, choosing appropriate technologies, and anticipating failure scenarios to build robust and maintainable systems.

Requirements GatheringArchitecture PatternsScalability
Practice
Cloud Disaster Recovery Strategies
Cloud & DevOps

Cloud Disaster Recovery Strategies

Disaster recovery strategies ensure business continuity by minimizing downtime and data loss during disruptive events. Choosing the appropriate strategy involves balancing recovery objectives (RTO/RPO) with cost and complexity, often leveraging cloud-native replication and failover mechanisms.

Disaster RecoveryRTO (Recovery Time Objective)RPO (Recovery Point Objective)
Practice
Visualizing a SQL query
Databases & Storage

Visualizing a SQL query

Visualizing a SQL query reveals the transformation pipeline from a declarative SQL statement to a concrete execution plan. Understanding this process enables developers to write efficient queries that leverage indexes and avoid performance bottlenecks.

ParsingQuery OptimizerExecution Plan
Practice
How does REST API work
API Design

How does REST API work

REST APIs enable decoupled client-server communication by adhering to architectural constraints like statelessness and uniform interface. They facilitate scalable data exchange using standard protocols and data formats.

HTTPJSONURL
Practice
Explaining 9 types of API testing
API Design

Explaining 9 types of API testing

API testing verifies that different software components communicate correctly, covering functionality, security, performance, and adherence to contracts. A comprehensive strategy involves various testing types to mitigate risks and ensure reliability.

Functional TestingContract TestingPerformance Testing
Practice
Git Merge vs. Rebase vs.Squash Commit
CS Fundamentals

Git Merge vs. Rebase vs.Squash Commit

Git merge, rebase, and squash commits are distinct methods for integrating changes from one branch into another, each manipulating the commit history in different ways. Choosing the right method involves balancing a desire for a clean, linear history against the risks of rewriting shared history and complicating collaboration.

MergeRebaseSquash Commit
Practice
What is a cookie
Networking & Protocols

What is a cookie

Cookies are small text files stored on a user's machine by a web browser, enabling websites to maintain state across multiple requests. They are a core mechanism for session management, personalization, and tracking, but introduce security and performance considerations.

HTTP protocolSet-Cookie headerCookie header
Practice
How does a VPN work
Security

How does a VPN work

A VPN establishes an encrypted tunnel between a client and a server, masking the client's IP address and encrypting traffic to ensure privacy and security. This prevents eavesdropping and allows users to bypass geo-restrictions, but introduces latency and relies on the VPN provider's security practices.

EncryptionIPSecOpenVPN
Practice
Top Software Architectural Styles
Architecture Patterns

Top Software Architectural Styles

Software architectural styles define the high-level structure and organization of a system, impacting its scalability, maintainability, and overall performance. Choosing the right style is critical for meeting non-functional requirements and avoiding architectural drift as the system evolves.

ArchitectureArchitectural StylesMonolith
Practice
Understanding Database Types
Databases & Storage

Understanding Database Types

Choosing the right database type is crucial for meeting application requirements regarding data consistency, scalability, and performance. SQL databases offer strong consistency while NoSQL databases provide flexibility and scalability, each optimized for different use cases.

SQL DatabasesNoSQL DatabasesACID Properties
Practice
Cloud Security Cheat Sheet ​
Cloud & DevOps

Cloud Security Cheat Sheet ​

Cloud security focuses on protecting cloud-based assets through proactive threat modeling, access controls, and continuous monitoring. It encompasses securing data at rest and in transit, network infrastructure, and application layers, while adhering to compliance standards.

Encryption (AES-256)Access Control (IAM, OAuth 2.0, SAML)Monitoring (Prometheus, Grafana)
Practice
Cloud security is the top priority for any business because it ensures the safety and privacy of their digital assets in the cloud
Cloud & DevOps

Cloud security is the top priority for any business because it ensures the safety and privacy of their digital assets in the cloud

Cloud security addresses the inherent risks of storing and processing data in shared, multi-tenant environments. It requires a layered approach to protect against unauthorized access, data breaches, and service disruptions while maintaining compliance and trust.

Data encryption (AES-256)Authentication (OAuth 2.0, OpenID Connect)Network security (Firewalls, IDS/IPS, WAF)
Practice
Having said that, it is not that simple, especially with so many services, applications, and potential threats to consider
Cloud & DevOps

Having said that, it is not that simple, especially with so many services, applications, and potential threats to consider

Cloud security ensures the confidentiality, integrity, and availability of data and applications within cloud environments by implementing controls across various layers. It addresses the inherent risks of distributed systems and shared infrastructure, necessitating a multi-faceted approach encompassing identity management, network security, data protection, and threat detection.

Cloud securityData protectionRisk mitigation
Practice
GitOps Workflow - Simplified Visual Guide
Cloud & DevOps

GitOps Workflow - Simplified Visual Guide

GitOps uses a Git repository as the single source of truth for declarative infrastructure and application deployments, employing automated reconciliation loops to converge the actual system state with the desired state defined in Git. This approach enhances consistency, auditability, and security in infrastructure and application delivery pipelines.

Git RepositoryDeclarative ConfigurationReconciliation Loop
Practice
How does “scan to pay” work
Messaging & Communication

How does “scan to pay” work

Scan-to-pay bridges the physical and digital realms by encoding transaction details into a visual code. Clients decode this code to initiate payment requests, requiring secure and idempotent processing by the payment processor.

QR codePayment processorDigital wallet
Practice
How do Search Engines Work
Cloud & DevOps

How do Search Engines Work

Search engines tackle the challenge of efficiently discovering, indexing, and ranking massive amounts of web content. They rely on distributed systems and specialized algorithms to deliver relevant search results with low latency at immense scale.

CrawlingIndexingRanking
Practice
The Payments Ecosystem
Real-World Architectures

The Payments Ecosystem

The payments ecosystem is a multi-layered architecture involving various entities and protocols to facilitate secure and reliable fund transfers. Its complexity arises from the need to balance speed, security, and regulatory compliance across diverse financial institutions.

Payment GatewayPayment ProcessorIssuing Bank
Practice
Object-oriented Programming: A Primer
Messaging & Communication

Object-oriented Programming: A Primer

Object-oriented programming enhances code organization by modeling software components as encapsulated objects with defined properties and behaviors. It promotes code reuse, modularity, and maintainability through principles like inheritance, polymorphism, and abstraction.

ObjectsClassesEncapsulation
Practice
Where do we cache data
Caching Strategies

Where do we cache data

Caching strategically positions data closer to consumers to reduce latency and offload origin servers. Effective caching implementations require careful consideration of invalidation strategies, eviction policies, and the trade-offs between cache size and memory usage.

Cache InvalidationEviction Policies (LRU, LFU)HTTP Caching (Cache-Control, ETag)
Practice
Flowchart of how slack decides to send a notification
Real-World Architectures

Flowchart of how slack decides to send a notification

Slack's notification system intelligently routes alerts based on user context, preferences, and message content, preventing notification fatigue. This involves a complex filtering and prioritization process to determine the appropriate delivery method and timing for each notification.

Message Queues (Kafka)Push Notifications (APNs, FCM)Rate Limiting
Practice
What is the best way to learn SQL
Databases & Storage

What is the best way to learn SQL

Learning SQL effectively requires understanding its core commands and how they interact with database systems. Mastering SQL involves balancing query optimization, data integrity, and security considerations for specific use cases.

SQL LanguageDDLDQL
Practice
What is gRPC
Networking & Protocols

What is gRPC

gRPC addresses the need for high-performance, strongly-typed communication between services, particularly in microservice architectures. It provides an efficient alternative to REST by leveraging Protocol Buffers for serialization and HTTP/2 for transport, optimizing for speed and reducing latency.

Protocol BuffersHTTP/2gRPC Stub
Practice
How do live streaming platforms like YouTube Live, TikTok Live, or Twitch work? ​
Real-World Architectures

How do live streaming platforms like YouTube Live, TikTok Live, or Twitch work? ​

Live streaming platforms solve the challenge of reliably distributing real-time video content to massive concurrent audiences with minimal latency. They employ a combination of video encoding, content delivery networks (CDNs), and adaptive bitrate streaming to optimize viewer experience under varying network conditions.

TranscodingCDNHLS
Practice
Linux Boot Process Illustrated
CS Fundamentals

Linux Boot Process Illustrated

The Linux boot process initializes hardware, loads the kernel, and starts system services to transition from power-on to a usable operating system. Understanding this process is crucial for diagnosing boot failures, optimizing startup time, and configuring system-level services.

BIOSUEFIPOST
Practice
How does Visa make money
Networking & Protocols

How does Visa make money

Visa generates revenue primarily through fees charged to banks for using its payment network, including interchange fees, network access fees, and other service-related charges. The scale and reliability of their global transaction processing system are paramount to their business model.

Interchange FeeNetwork Access FeeISO 8583
Practice
Session, Cookie, JWT, Token, SSO, and OAuth 2.0 Explained in One Diagram
Security

Session, Cookie, JWT, Token, SSO, and OAuth 2.0 Explained in One Diagram

User authentication and authorization require balancing security, scalability, and user experience. Sessions, cookies, tokens (including JWTs), SSO, and OAuth 2.0 represent different approaches to verifying user identity and granting access to resources, each with distinct trade-offs in terms of state management, security risks, and complexity.

SessionCookieJWT
Practice
How do we manage configurations in a system
Messaging & Communication

How do we manage configurations in a system

Configuration management ensures consistency and reliability across systems by automating the deployment and maintenance of software and infrastructure. Tools like Terraform and Ansible enable Infrastructure as Code (IaC), allowing engineers to define and manage system configurations declaratively, reducing manual errors and scaling operations efficiently.

Configuration ManagementInfrastructure as Code (IaC)Terraform
Practice
What is CSS (Cascading Style Sheets)
CS Fundamentals

What is CSS (Cascading Style Sheets)

CSS decouples content from presentation, enabling consistent styling and maintainability across web applications. It defines rules for how HTML elements are displayed, handling layout, typography, and visual effects.

SelectorsPropertiesCascading
Practice
What is GraphQL? Is it a replacement for the REST API
Networking & Protocols

What is GraphQL? Is it a replacement for the REST API

GraphQL addresses the problem of over-fetching and under-fetching data common in REST APIs by allowing clients to specify exactly what data they need. It introduces complexities related to query optimization and security that must be carefully managed.

Query LanguageSchema Definition Language (SDL)Resolvers
Practice
System Design Blueprint: The Ultimate Guide
Networking & Protocols

System Design Blueprint: The Ultimate Guide

An API Gateway decouples client applications from backend services, providing a single entry point for requests and handling cross-cutting concerns like authentication, rate limiting, and request routing. It enables independent evolution of services and simplifies client development by abstracting away backend complexity.

API GatewayReverse ProxyAuthentication
Practice
Polling Vs Webhooks
Messaging & Communication

Polling Vs Webhooks

Polling involves clients repeatedly requesting data from a server, while webhooks enable servers to push data to clients upon events. Webhooks offer lower latency and improved efficiency, but polling provides greater control and compatibility with legacy systems.

PollingWebhooksHTTP POST
Practice
How are notifications pushed to our phones or PCs
Cloud & DevOps

How are notifications pushed to our phones or PCs

Push notifications provide a mechanism for applications to deliver asynchronous updates to users, even when the application is not actively running. This requires a robust system for managing device registrations, message queuing, and platform-specific delivery protocols to ensure reliable and timely delivery.

Push NotificationsFCMAPNs
Practice
9 best practices for developing microservices
Architecture Patterns

9 best practices for developing microservices

Microservices decompose applications into independent, deployable services, increasing agility but introducing distributed systems challenges. Key best practices focus on data isolation, bounded context, and observable communication to ensure resilience and maintainability.

Independent DataSingle Responsibility PrincipleDomain-Driven Design
Practice
When we develop microservices, we need to follow the following best practices:
Architecture Patterns

When we develop microservices, we need to follow the following best practices:

Microservice best practices address the challenges of distributed systems by promoting modularity, fault isolation, and independent deployability. Adhering to these practices minimizes coupling, simplifies maintenance, and enhances the overall resilience of a microservice architecture.

Single Responsibility PrincipleSeparate Data OwnershipStateless Services
Practice
1.​Use separate data storage for each microservice
Architecture Patterns

1.​Use separate data storage for each microservice

Microservice architectures benefit from data isolation, preventing a single point of failure and enabling independent scaling and deployment. Each microservice manages its own database, communicating with others through APIs or asynchronous messaging to maintain autonomy.

Data IsolationIndependenceResilience
Practice
2.​Keep code at a similar level of maturity
Architecture Patterns

2.​Keep code at a similar level of maturity

Maintaining similar levels of technology maturity across microservices reduces integration complexity and operational overhead. Disparate technology stacks increase the burden of testing, monitoring, and cross-functional debugging.

MicroservicesAPI VersioningTechnology Stack
Practice
3.​Separate build for each microservice
Architecture Patterns

3.​Separate build for each microservice

Each microservice should have its own dedicated build pipeline, enabling independent deployments and minimizing the risk of cascading failures across services. This approach relies on artifact repositories and semantic versioning to manage dependencies and ensure reproducibility.

MicroservicesCI/CD PipelinesSemantic Versioning
Practice
4.​Assign each microservice with a single responsibility
Architecture Patterns

4.​Assign each microservice with a single responsibility

A microservice should encapsulate a single, well-defined business capability. This promotes independent deployment, scalability, and fault isolation, reducing the blast radius of failures and simplifying the codebase.

Single Responsibility PrincipleMicroservicesDomain-Driven Design
Practice
5.​Deploy into containers
Cloud & DevOps

5.​Deploy into containers

Containerization packages application code and dependencies into isolated units, ensuring consistent execution across diverse environments. Container orchestration platforms automate deployment, scaling, and management, enabling efficient resource utilization and high availability.

ContainersDockerKubernetes
Practice
6.​Design stateless services
Architecture Patterns

6.​Design stateless services

Stateless services ensure each request is treated independently, simplifying scaling and improving fault tolerance by removing inter-request dependencies. This architecture pushes state management to external systems, preventing single points of failure and enabling horizontal scalability.

StatelessMicroservicesScalability
Practice
7.​Adopt domain-driven design
Architecture Patterns

7.​Adopt domain-driven design

Domain-Driven Design (DDD) aligns software architecture with business needs by modeling systems around distinct business capabilities, promoting modularity and maintainability. It reduces complexity by establishing clear boundaries and specialized data models for each domain.

Domain-Driven DesignBounded ContextsMicroservices
Practice
8.​Design micro frontend
Architecture Patterns

8.​Design micro frontend

Micro frontends enable independent teams to develop and deploy parts of a web application separately, increasing agility but introducing integration complexity. Careful consideration of communication, shared dependencies, and state management is crucial for a successful implementation.

Micro frontendsIndependent deploymentWeb components
Practice
9.​Orchestrating microservices
Architecture Patterns

9.​Orchestrating microservices

Microservice orchestration addresses the challenge of coordinating independent services to fulfill a single business function, ensuring consistent and reliable execution. Without it, complex transactions risk partial completion, data inconsistency, and a degraded user experience.

OrchestrationMicroservicesWorkflow Engine
Practice
Oauth 2.0 Explained With Simple Terms
Security

Oauth 2.0 Explained With Simple Terms

OAuth 2.0 addresses the challenge of granting applications limited access to user resources without exposing sensitive credentials. It defines a standardized authorization framework, enabling secure delegation of access rights.

OAuth 2.0Access TokenRefresh Token
Practice
How do companies ship code to production
Networking & Protocols

How do companies ship code to production

Shipping code to production requires a robust and automated process to minimize risk and ensure service reliability. Continuous Integration and Continuous Delivery pipelines are crucial for validating and deploying code changes efficiently.

Continuous Integration (CI)Continuous Delivery (CD)Deployment Strategies (Rolling, Blue/Green, Canary)
Practice
How do we manage sensitive data in a system
Cloud & DevOps

How do we manage sensitive data in a system

Protecting sensitive data involves encryption, access control, and data transformation techniques. Proper key management and adherence to compliance standards are crucial to prevent data breaches and maintain user trust.

EncryptionData desensitizationMinimal permissions
Practice
Cloud Load Balancer Cheat Sheet
Cloud & DevOps

Cloud Load Balancer Cheat Sheet

Cloud load balancers distribute incoming network traffic across multiple backend servers to prevent overload and ensure high availability. Selecting the appropriate load balancer and configuration is crucial for meeting performance, scalability, and resilience requirements.

Load BalancerLayer 4 Load BalancingLayer 7 Load Balancing
Practice
What does ACID mean
Databases & Storage

What does ACID mean

ACID properties (Atomicity, Consistency, Isolation, Durability) are a set of guarantees ensuring reliable database transactions and data integrity, especially in the face of concurrent operations and system failures. Understanding their trade-offs is crucial for designing robust and scalable data storage solutions.

AtomicityConsistencyIsolation
Practice
CAP, BASE, SOLID, KISS, What do these acronyms mean
Databases & Storage

CAP, BASE, SOLID, KISS, What do these acronyms mean

CAP, BASE, SOLID, and KISS are key architectural principles that guide trade-offs in distributed systems and software design, balancing concerns like consistency, availability, maintainability, and complexity. Understanding these concepts is crucial for building robust and scalable applications that meet specific performance and reliability requirements.

CAP TheoremConsistencyAvailability
Practice
System Design cheat sheet
Architecture Patterns

System Design cheat sheet

High availability and throughput are primary goals in system design, achieved through techniques like redundancy and caching. These strategies mitigate the impact of failures and reduce latency by replicating services and data closer to users.

AvailabilityThroughputRedundancy
Practice
How will you design the Stack Overflow website
CS Fundamentals

How will you design the Stack Overflow website

Designing Stack Overflow requires balancing scalability and maintainability, understanding that simpler, well-optimized architectures can often outperform complex distributed systems. The key is to justify design choices with concrete reasoning around traffic patterns, data access, and operational overhead.

MonolithCachingCDN
Practice
A nice cheat sheet of different cloud services
Cloud & DevOps

A nice cheat sheet of different cloud services

Cloud services abstract away the complexities of managing physical infrastructure, offering on-demand compute, storage, and networking resources. Choosing the right services involves navigating trade-offs between cost, performance, scalability, and operational overhead, tailored to specific application requirements.

Virtual Machines (VMs)ContainersKubernetes
Practice
The one-line change that reduced clone times by a whopping 99%, says Pinterest
Real-World Architectures

The one-line change that reduced clone times by a whopping 99%, says Pinterest

Pinterest reduced code clone times by 99% by implementing Git sparse checkouts, optimizing CI/CD pipelines in their monorepo. This selective data transfer significantly improved developer velocity and reduced infrastructure load.

MonorepoJenkinsSparse Checkouts
Practice
​While it may sound cliché, small changes can definitely create a big impact
Real-World Architectures

​While it may sound cliché, small changes can definitely create a big impact

In large monorepos, inefficient code fetching during development and CI/CD can lead to significant delays. Optimizing Git workflows, such as using sparse checkouts and shallow clones, can dramatically reduce clone times and improve overall developer productivity.

GitMonorepoRefspec
Practice
Best ways to test system functionality
Security

Best ways to test system functionality

System functionality testing validates individual components and their interactions to ensure reliability and prevent regressions. A comprehensive testing strategy includes various test types, automation, and monitoring to guarantee system health across different environments.

Unit TestingIntegration TestingSystem Testing
Practice
Encoding vs Encryption vs Tokenization​Encoding, encryption, and tokenization are three distinct processes that handle data in different ways for various purposes, including data transmission, security, and compliance
Security

Encoding vs Encryption vs Tokenization​Encoding, encryption, and tokenization are three distinct processes that handle data in different ways for various purposes, including data transmission, security, and compliance

Encoding transforms data for proper transmission, encryption obscures data to protect confidentiality, and tokenization replaces sensitive data with non-sensitive surrogates to reduce risk and comply with regulations. Choosing the right approach depends on the specific security and compliance needs, performance requirements, and complexity constraints of the system.

EncodingEncryptionTokenization
Practice
Kubernetes Tools Stack Wheel
Cloud & DevOps

Kubernetes Tools Stack Wheel

The Kubernetes tools landscape is complex; the 'Tools Stack Wheel' helps categorize and understand the purpose of various tools used for infrastructure provisioning, application deployment, and cluster monitoring. It addresses the problem of tool sprawl and helps engineers select the right tool for a specific task within a Kubernetes environment.

KubernetesTerraformHelm
Practice
How does Docker work
Cloud & DevOps

How does Docker work

Docker solves the problem of inconsistent software execution environments by packaging applications and their dependencies into isolated containers. It leverages OS-level virtualization features to ensure applications run the same way regardless of the underlying infrastructure.

ContainerImageRegistry
Practice
Top 6 Database Models
Databases & Storage

Top 6 Database Models

Database models dictate how data is structured, stored, and accessed, directly impacting query performance, data integrity, and system scalability. Selecting the appropriate model is crucial for meeting application-specific requirements and avoiding performance bottlenecks or data inconsistencies.

Relational modelNoSQLACID properties
Practice
How do we detect node failures in distributed systems
Architecture Patterns

How do we detect node failures in distributed systems

Detecting node failures in distributed systems is paramount for maintaining service availability and preventing cascading failures. Heartbeats, periodic signals exchanged between nodes, are a common mechanism for monitoring node health, but require careful consideration of frequency, timeout, and network conditions.

HeartbeatsNode failure detectionTimeout
Practice
10 Good Coding Principles to improve code quality
Real-World Architectures

10 Good Coding Principles to improve code quality

Coding principles are the established guidelines that enhance code maintainability, readability, and robustness. Adhering to these principles reduces technical debt and promotes collaboration within development teams.

Style GuidesDocumentationRobustness
Practice
15 Open-Source Projects That Changed the World
CS Fundamentals

15 Open-Source Projects That Changed the World

Open-source projects provide publicly accessible codebases, enabling collaborative software development, widespread adoption, and accelerated innovation. They address the tension between proprietary software's control and open software's community-driven advancement.

Open-sourceCollaborationGit
Practice
Reverse proxy vs. API gateway vs. load balancer​
Networking & Protocols

Reverse proxy vs. API gateway vs. load balancer​

Reverse proxies, API gateways, and load balancers address distinct concerns: distributing traffic across servers, securing and abstracting backend infrastructure, and managing inter-service communication, respectively. Understanding their roles and trade-offs is critical for designing scalable, secure, and maintainable systems.

Load BalancerReverse ProxyAPI Gateway
Practice
Linux Performance Observability Tools
CS Fundamentals

Linux Performance Observability Tools

Linux performance observability tools are essential for diagnosing resource bottlenecks and optimizing application performance by providing detailed insights into system-level metrics. These tools enable engineers to understand resource consumption patterns and identify areas for improvement, ensuring efficient system operation.

vmstatiostatnetstat
Practice
Top 9 website performance metrics you cannot ignore
Networking & Protocols

Top 9 website performance metrics you cannot ignore

Website performance metrics are critical for identifying bottlenecks and optimizing user experience. Monitoring these metrics enables engineers to proactively address issues impacting speed, responsiveness, and overall website health.

Load TimeTime to First Byte (TTFB)Request Count
Practice
How do we manage data
Caching Strategies

How do we manage data

Effective data management strategies are crucial for optimizing performance, scalability, and consistency in distributed systems. Techniques like caching, materialized views, and CQRS address the inherent tension between read and write operations, data consistency, and query complexity.

Cache AsideMaterialized ViewCQRS
Practice
Comparing Different API Clients: Postman vs. Insomnia vs. ReadyAPI vs. Thunder Client vs. Hoppscotch
Real-World Architectures

Comparing Different API Clients: Postman vs. Insomnia vs. ReadyAPI vs. Thunder Client vs. Hoppscotch

API clients are essential tools for developing and testing web services, allowing engineers to inspect requests and responses. The choice of client depends on project complexity, team size, and required features like protocol support, collaboration, and performance testing.

API ClientsHTTPREST
Practice
How does gRPC work?​RPC (Remote Procedure Call) is called “𝐫𝐞𝐦𝐨𝐭𝐞” because it enables communications between remote services when services are deployed to different servers under microservice architecture. From the user’s point of view, it acts like a local function call
Architecture Patterns

How does gRPC work?​RPC (Remote Procedure Call) is called “𝐫𝐞𝐦𝐨𝐭𝐞” because it enables communications between remote services when services are deployed to different servers under microservice architecture. From the user’s point of view, it acts like a local function call

gRPC is a high-performance, language-agnostic RPC framework developed by Google that leverages protocol buffers for serialization and HTTP/2 for transport, enabling efficient and strongly-typed communication between services. It addresses the challenges of building scalable and maintainable microservice architectures by providing a standardized and performant communication layer.

RPCProtocol Buffers (protobuf)HTTP/2
Practice
Have you heard of the 12-Factor App
CS Fundamentals

Have you heard of the 12-Factor App

The 12-Factor App methodology outlines a set of principles for building portable, resilient, and scalable applications, emphasizing practices like statelessness, configuration externalization, and well-defined dependencies to ensure consistent behavior across environments. Adhering to these factors promotes maintainability and simplifies deployment, especially in cloud-native architectures.

CodebaseDependenciesConfiguration
Practice
How does Redis architecture evolve
Caching Strategies

How does Redis architecture evolve

Redis evolved from a single-instance, in-memory data structure server to a distributed data platform to address limitations in data durability, read scalability, and overall capacity. These architectural changes introduced complexity, requiring careful consideration of consistency, availability, and performance trade-offs.

PersistenceReplicationSentinel
Practice
Cloud Cost Reduction Techniques
Cloud & DevOps

Cloud Cost Reduction Techniques

Unmanaged cloud deployments often lead to over-provisioned resources and unnecessary expenses. Implementing cost optimization techniques such as rightsizing, reserved instances, and efficient data transfer strategies is crucial for maintaining a cost-effective cloud infrastructure.

RightsizingReserved InstancesSpot Instances
Practice
Linux file permission illustrated​To understand Linux file permissions, we need to understand Ownership and Permission
CS Fundamentals

Linux file permission illustrated​To understand Linux file permissions, we need to understand Ownership and Permission

Linux file permissions are a fundamental security mechanism controlling access to files and directories based on user, group, and other classifications; improper configuration can lead to significant security vulnerabilities. The 'chmod' command modifies these permissions, while tools like 'ls' and 'chown' help inspect and manage file ownership.

File PermissionschmodOwnership
Practice
There are over 1,000 engineering blogs. Here are my top 9 favorites
Real-World Architectures

There are over 1,000 engineering blogs. Here are my top 9 favorites

Engineering blogs from leading tech companies offer valuable insights into real-world system design, implementation details, and operational challenges. They provide a practical complement to theoretical knowledge, showcasing how specific technologies are applied at scale to solve complex problems.

Engineering BlogsSystem DesignScalability
Practice
9 Best Practices for Building Microservices
Architecture Patterns

9 Best Practices for Building Microservices

Microservices introduce complexity in inter-service communication, data management, and operational overhead. Applying architectural best practices mitigates these challenges, leading to a more resilient and scalable system.

MicroservicesAPI GatewayService Discovery (Consul, etcd)
Practice
Roadmap for Learning Cyber Security
Architecture Patterns

Roadmap for Learning Cyber Security

Building robust cybersecurity requires a layered approach, integrating architectural design, risk management, threat intelligence, and application security. A strong cybersecurity posture defends against attacks, protects data, and ensures business continuity.

Security ArchitectureRisk AssessmentThreat Intelligence
Practice
How does Javascript Work
CS Fundamentals

How does Javascript Work

Javascript enables dynamic, interactive web experiences by executing code within the browser. Its core function is to manipulate the Document Object Model (DOM) and handle asynchronous operations, reducing server load and enhancing responsiveness.

Javascript EngineV8ECMAScript
Practice
Can Kafka Lose Messages
Messaging & Communication

Can Kafka Lose Messages

While Kafka is built for fault tolerance, message loss is possible without careful configuration. Understanding the interplay of producer settings, broker configurations, and consumer behavior is crucial to ensure message delivery guarantees.

ProducersBrokersConsumers
Practice
You're Decent at Linux if You Know What Those Directories Mean :)
CS Fundamentals

You're Decent at Linux if You Know What Those Directories Mean :)

The Filesystem Hierarchy Standard (FHS) provides a common directory structure across Linux distributions, enabling predictable file locations for system administration and application deployment. Without it, configuration management and log analysis become significantly more complex, hindering operational efficiency.

Filesystem Hierarchy Standard (FHS)Root directory/bin
Practice
Netflix's Tech Stack
Real-World Architectures

Netflix's Tech Stack

Netflix's architecture prioritizes high availability and low latency for video streaming at a massive scale, employing a microservices architecture and a globally distributed content delivery network. Their tech stack emphasizes fault tolerance, scalability, and personalized user experiences through sophisticated data processing and recommendation algorithms.

MicroservicesCDN (Open Connect)Cassandra
Practice
Top 5 Kafka use cases
Messaging & Communication

Top 5 Kafka use cases

Kafka's core value lies in reliably decoupling data producers from consumers at scale, enabling asynchronous processing and fault tolerance. It's a distributed streaming platform optimized for high-throughput data ingestion, transformation, and delivery.

Topics and PartitionsProducers and ConsumersConsumer Groups
Practice
Top 6 Cloud Messaging Patterns
Cloud & DevOps

Top 6 Cloud Messaging Patterns

Cloud messaging patterns provide solutions for asynchronous communication between distributed systems, enabling scalability, resilience, and loose coupling. They address challenges like handling complex workflows, managing message volume, and ensuring reliable delivery in the face of failures.

Asynchronous CommunicationPub-SubClaim Check
Practice
How Netflix Really Uses Java
CS Fundamentals

How Netflix Really Uses Java

Netflix employs Java extensively on its backend, leveraging an evolved architecture that includes Backend for Frontends (BFFs) orchestrated with GraphQL to optimize data fetching and tailor the user experience across diverse devices. This architecture addresses the challenges of efficiently serving varying client needs from a microservices-based platform.

MicroservicesAPI GatewayBFF (Backend for Frontend)
Practice
Top 9 Architectural Patterns for Data and Communication Flow
API Design

Top 9 Architectural Patterns for Data and Communication Flow

Architectural patterns define how data flows between components, impacting scalability, latency, and resilience. Selecting the right pattern depends on specific application requirements and trade-offs between complexity and performance.

Request-ResponseAPI GatewayPub-Sub
Practice
What Are the Most Important AWS Services To Learn
Cloud & DevOps

What Are the Most Important AWS Services To Learn

A pragmatic understanding of core AWS services like EC2, S3, IAM, and RDS is essential for building scalable and reliable cloud applications. These services provide the foundational building blocks for compute, storage, identity management, and data persistence that underpin most architectures.

EC2S3IAM
Practice
8 Key Data Structures That Power Modern Databases
Databases & Storage

8 Key Data Structures That Power Modern Databases

Databases rely on specialized data structures to optimize data access and storage based on workload patterns. Understanding the properties of these structures is crucial for designing performant and scalable database systems.

Hash IndexB-treeLSM Tree
Practice
How do we design effective and safe APIs
Networking & Protocols

How do we design effective and safe APIs

Effective and safe APIs are critical for enabling communication between services while protecting against abuse and ensuring data integrity. Designing robust APIs requires careful consideration of authentication, authorization, rate limiting, data validation, and monitoring to maintain performance and security.

API designAuthenticationAuthorization
Practice
Who are the Fantastic Four of System Design
Architecture Patterns

Who are the Fantastic Four of System Design

Scalability, Availability, Reliability, and Performance are the foundational pillars of system design, representing the core requirements for building robust and user-friendly applications. These four aspects are often intertwined, and understanding their trade-offs is essential for crafting effective solutions.

ScalabilityAvailabilityReliability
Practice
How do we design a secure system
Security

How do we design a secure system

Secure system design focuses on minimizing attack surfaces and mitigating potential damage. It requires a layered approach encompassing authentication, authorization, encryption, vulnerability management, and incident response.

AuthenticationAuthorizationEncryption
Practice
Things Every Developer Should Know: Concurrency is NOT parallelism
CS Fundamentals

Things Every Developer Should Know: Concurrency is NOT parallelism

Concurrency is about managing multiple tasks within a single processing core using techniques like time-slicing, while parallelism is the simultaneous execution of multiple tasks across multiple processing cores. Misunderstanding the distinction leads to inefficient system design and missed optimization opportunities.

ConcurrencyParallelismThreads
Practice
HTTPS, SSL Handshake, and Data Encryption Explained to Kids
Security

HTTPS, SSL Handshake, and Data Encryption Explained to Kids

HTTPS provides secure communication over a computer network by encrypting data in transit and authenticating the server. The SSL/TLS handshake establishes a secure session using asymmetric and symmetric cryptography, ensuring confidentiality and integrity.

HTTPSTLSSSL
Practice
Top 5 Software Architectural Patterns
Architecture Patterns

Top 5 Software Architectural Patterns

Software architectural patterns provide proven solutions for common system design challenges, influencing scalability, maintainability, and overall system resilience. Choosing the right pattern requires careful consideration of non-functional requirements and potential trade-offs.

Architectural PatternLayered ArchitectureMicroservices
Practice
Top 6 Tools to Turn Code into Beautiful Diagrams
Architecture Patterns

Top 6 Tools to Turn Code into Beautiful Diagrams

Visualizing system architecture is crucial for communication and debugging. Diagram-as-code tools bridge the gap between code and diagrams, enabling engineers to represent complex systems in a clear, maintainable format.

Diagram as CodePlantUMLMermaid
Practice
Everything is a trade-off
Architecture Patterns

Everything is a trade-off

System design revolves around navigating trade-offs; optimizing one aspect invariably impacts others. Understanding and articulating these compromises, along with their implications, is critical for building robust and scalable systems.

Trade-offsCost vs. PerformanceReliability vs. Scalability
Practice
What is DevSecOps
Cloud & DevOps

What is DevSecOps

DevSecOps addresses the inherent tension between rapid software delivery and robust security by integrating security practices into all phases of the software development lifecycle. It aims to automate security checks and foster collaboration to minimize vulnerabilities and ensure continuous protection.

SecurityAutomationCollaboration
Practice
Top 8 Cache Eviction Strategies
Caching Strategies

Top 8 Cache Eviction Strategies

Cache eviction strategies determine which data is removed when a cache reaches capacity, balancing hit rate, overhead, and staleness. The choice impacts performance, cost, and resilience, requiring careful consideration of access patterns and data characteristics.

CacheEvictionLRU
Practice
Linux Boot Process Explained
CS Fundamentals

Linux Boot Process Explained

The Linux boot process initializes hardware, loads the kernel, and starts essential system services, culminating in a fully operational system. Understanding each stage - from firmware initialization to process management - is critical for diagnosing boot issues and optimizing system performance.

BIOS/UEFIBootloader (GRUB)Kernel
Practice
Unusual Evolution of the Netflix API Architecture
Architecture Patterns

Unusual Evolution of the Netflix API Architecture

Netflix's API architecture evolved to address performance bottlenecks and team autonomy challenges. They transitioned from a monolithic API to microservices orchestrated by API Gateways, eventually adopting a federated GraphQL layer for efficient data fetching and schema management.

MonolithMicroservicesAPI Gateway
Practice
GET, POST, PUT... Common HTTP “verbs” in one figure
Networking & Protocols

GET, POST, PUT... Common HTTP “verbs” in one figure

HTTP verbs define the intended action on a resource, enabling clients to interact with servers in a standardized manner. Incorrect verb usage leads to unpredictable behavior, data corruption, and non-RESTful APIs.

GETPOSTPUT
Practice
Top 8 C++ Use Cases
CS Fundamentals

Top 8 C++ Use Cases

C++ excels in scenarios demanding high performance, low-latency, and fine-grained resource control. Its use spans from operating systems and embedded devices to high-frequency trading platforms and database management systems where efficiency is paramount.

PerformanceLow-LatencyMemory Management
Practice
Top 4 data sharding algorithms explained
Databases & Storage

Top 4 data sharding algorithms explained

Data sharding distributes a large dataset across multiple independent databases to improve query performance and write throughput. Choosing the right sharding algorithm balances data distribution, query patterns, and operational complexity during resharding.

Data ShardingRange-Based ShardingHash-Based Sharding
Practice
10 years ago, Amazon found that every 100ms of latency cost them 1% in sales
Networking & Protocols

10 years ago, Amazon found that every 100ms of latency cost them 1% in sales

Latency directly impacts user experience and revenue. Optimizing for low latency involves techniques like caching, load balancing, and efficient data transfer protocols to minimize delays in delivering content and processing requests.

LatencyCachingLoad Balancing
Practice
Load Balancer Realistic Use Cases You May Not Know
CS Fundamentals

Load Balancer Realistic Use Cases You May Not Know

Load balancers distribute network traffic across multiple servers to prevent overload, ensuring application availability and responsiveness. They mitigate single points of failure and adapt to changing traffic patterns or server health, crucial for high-traffic systems.

Load Balancing AlgorithmsHealth ChecksSession Affinity
Practice
25 Papers That Completely Transformed the Computer World
Real-World Architectures

25 Papers That Completely Transformed the Computer World

Seminal research papers document the evolution of distributed systems, revealing architectural patterns and core algorithms used to achieve massive scale and reliability. Understanding these papers provides insight into the fundamental trade-offs that underpin modern infrastructure.

ScalabilityAvailabilityDistributed Systems
Practice
IPv4 vs. IPv6, what are the differences
Networking & Protocols

IPv4 vs. IPv6, what are the differences

IPv4 and IPv6 are internet protocol versions that dictate how devices are addressed on a network. IPv6 addresses the limitations of IPv4's address space and header inefficiencies, but transitioning requires careful consideration of compatibility and security.

IPv4IPv6Address space
Practice
My Favorite 10 Books for Software Developers
Architecture Patterns

My Favorite 10 Books for Software Developers

Software architecture books help engineers navigate complex trade-offs in system design, providing patterns and mental models to build scalable, resilient, and maintainable applications. These resources offer practical guidance on everything from code structure to distributed system design, enabling engineers to make informed decisions about technology choices and system behavior.

Design PatternsSOLID PrinciplesMicroservices
Practice
Change Data Capture: key to leverage real-time Data
Databases & Storage

Change Data Capture: key to leverage real-time Data

Change Data Capture (CDC) addresses the challenge of propagating real-time data modifications from a source database to downstream systems. It enables decoupled architectures and real-time data availability without directly querying the source database.

Change Data Capture (CDC)Transaction LogsApache Kafka
Practice
Netflix's Overall Architecture
Real-World Architectures

Netflix's Overall Architecture

Netflix's architecture is a highly distributed, microservice-based system optimized for streaming video content globally. It emphasizes high availability, scalability, and personalized user experiences through a combination of cloud infrastructure, content delivery networks, and sophisticated data processing pipelines.

MicroservicesContent Delivery Network (CDN)Amazon S3
Practice
Top 5 common ways to improve API performance
API Design

Top 5 common ways to improve API performance

Optimizing API performance involves addressing bottlenecks across data transfer, processing overhead, and resource utilization. Techniques like pagination, caching, compression, asynchronous operations, and connection pooling are crucial for achieving acceptable latency and throughput.

PaginationCachingCompression
Practice
Popular interview question: how to diagnose a mysterious process that’s taking too much CPU, memory, IO, etc
CS Fundamentals

Popular interview question: how to diagnose a mysterious process that’s taking too much CPU, memory, IO, etc

Diagnosing resource contention requires a systematic approach, starting with system-level metrics and progressively drilling down to individual processes and resources. Effective diagnosis involves understanding the underlying resource utilization patterns and potential bottlenecks.

Resource UtilizationProcess MonitoringSystem Overview
Practice
What is a deadlock
Databases & Storage

What is a deadlock

Deadlocks occur when two or more processes are blocked indefinitely, each waiting for the other to release a resource. Preventing or resolving deadlocks requires careful resource management, often trading off concurrency for safety.

DeadlockResource OrderingTimeouts
Practice
What’s the difference between Session-based authentication and JWTs
Security

What’s the difference between Session-based authentication and JWTs

Session-based authentication relies on server-side storage of user state, creating scalability challenges. JWTs encode user information and are cryptographically signed, enabling stateless authentication at the cost of increased complexity and revocation difficulties.

Session-based authenticationJWT (JSON Web Token)Session ID
Practice
Top 9 Cases Behind 100% CPU Usage
Messaging & Communication

Top 9 Cases Behind 100% CPU Usage

Sustained high CPU utilization indicates a system bottleneck, potentially leading to performance degradation or service unavailability. Diagnosing the root cause requires a systematic approach, including profiling, monitoring, and understanding potential resource contention.

CPU ProfilingResource ContentionInfinite Loops
Practice
Top 6 ElasticSearch Use Cases
CS Fundamentals

Top 6 ElasticSearch Use Cases

Elasticsearch serves as a distributed search and analytics engine capable of full-text search, real-time data analysis, and security information and event management (SIEM). Its ability to rapidly index and query large datasets makes it applicable across diverse use cases, though careful consideration must be given to data consistency and resource utilization.

Full-Text SearchReal-Time AnalyticsSIEM
Practice
AWS Services Cheat Sheet
Cloud & DevOps

AWS Services Cheat Sheet

AWS provides a suite of cloud computing services, each optimized for different workloads. Understanding the trade-offs between these services is crucial for building cost-effective and scalable systems.

EC2S3Lambda
Practice
How do computer programs run
CS Fundamentals

How do computer programs run

Computer programs execute via the operating system loading instructions from storage into memory, allocating a virtual address space, and scheduling CPU time for the program to run. Understanding memory management, CPU scheduling, and system calls is critical for building efficient and secure systems.

Operating SystemVirtual Address SpaceCPU Scheduling
Practice
A cheat sheet for API designs
API Design

A cheat sheet for API designs

APIs are the entry points to your system, and securing them against abuse and unauthorized access is paramount. A well-designed API considers authentication, authorization, request integrity, and rate limiting to ensure both security and availability.

API KeysOAuth 2.0JWT (JSON Web Token)
Practice
Azure Services Cheat Sheet
Cloud & DevOps

Azure Services Cheat Sheet

Azure provides a suite of cloud services enabling scalable application deployment and management. It offers compute, storage, and advanced services like AI, requiring careful consideration of scaling strategies, networking, and cost optimization.

Cloud computingScalabilityVirtual Machines (VMs)
Practice
Why is Kafka fast
Messaging & Communication

Why is Kafka fast

Kafka's speed stems from a combination of sequential disk I/O, zero-copy data transfer, and efficient batching, minimizing latency and maximizing throughput. Its distributed architecture and reliance on OS-level caching further contribute to its performance.

Sequential I/OZero CopyBatching
Practice
How do we retry on failures
Networking & Protocols

How do we retry on failures

Retry mechanisms are fundamental for fault tolerance in distributed systems, allowing services to recover from transient failures. Intelligent retry strategies prevent cascading failures and ensure eventual consistency without overwhelming dependent services.

RetryExponential BackoffJitter
Practice
7 must-know strategies to scale your database
Databases & Storage

7 must-know strategies to scale your database

Database scaling addresses performance degradation as load increases by distributing data and processing. Effective strategies maintain responsiveness and availability while balancing consistency, cost, and operational complexity.

IndexingCachingReplication
Practice
Reddit’s Core Architecture that helps it serve over 1 billion users every month
Real-World Architectures

Reddit’s Core Architecture that helps it serve over 1 billion users every month

Reddit's architecture emphasizes speed and reliability at massive scale, using CDNs, microservices, and asynchronous task queues to handle over a billion monthly users. Key to its success is balancing consistency and availability under heavy load, while optimizing for user experience.

CDNMicroservicesLoad Balancing
Practice
Everything You Need to Know About Cross-Site Scripting (XSS)
Security

Everything You Need to Know About Cross-Site Scripting (XSS)

Cross-Site Scripting (XSS) attacks exploit vulnerabilities in web applications to inject malicious scripts into trusted websites, compromising user data and application functionality. Effective mitigation requires a layered approach combining input validation, output encoding, and Content Security Policy implementation.

XSSReflected XSSStored XSS
Practice
Types of Memory and Storage
CS Fundamentals

Types of Memory and Storage

Memory and storage represent a hierarchy of data access, balancing speed, cost, and persistence. Selecting the appropriate memory and storage technologies is crucial for optimizing application performance, data durability, and overall system cost.

RAMROMHDD
Practice
How to load your websites at lightning speed
Networking & Protocols

How to load your websites at lightning speed

Optimizing website load times is critical for user engagement and conversion. Techniques like lazy loading, preloading, and code splitting reduce initial payload size and prioritize critical resources, while CDNs and caching minimize latency.

Lazy LoadingPreloadingCode Splitting
Practice
10 Essential Components of a Production Web Application
Messaging & Communication

10 Essential Components of a Production Web Application

Production web applications rely on a suite of interconnected components to ensure scalability, reliability, and performance. Understanding the role and interplay of each component is crucial for robust system design.

Load BalancerCDNMessage Queue
Practice
Top 8 Standards Every Developer Should Know
Networking & Protocols

Top 8 Standards Every Developer Should Know

Networking standards ensure interoperability and consistency across distributed systems. They provide common protocols and specifications for communication, data representation, and security, enabling diverse components to work together reliably.

TCP/IPHTTPSQL
Practice
Explaining JSON Web Token (JWT) with simple terms
Security

Explaining JSON Web Token (JWT) with simple terms

JSON Web Tokens (JWTs) provide a compact, self-contained method for securely transmitting information as a JSON object. They enable stateless authentication and authorization, reducing server-side session management overhead.

JSON Web Token (JWT)HeaderPayload
Practice
11 steps to go from Junior to Senior Developer
Real-World Architectures

11 steps to go from Junior to Senior Developer

Becoming a senior engineer requires mastering not only coding but also collaboration, system design principles, and operational excellence. This involves understanding trade-offs, choosing appropriate technologies, and ensuring the reliability and scalability of systems under real-world constraints.

System DesignAPI Design (REST, gRPC)Databases (SQL, NoSQL)
Practice
Top 8 must-know Docker concepts
Cloud & DevOps

Top 8 must-know Docker concepts

Docker solves the problem of inconsistent application environments by packaging applications and their dependencies into isolated containers. This ensures applications run the same way regardless of the underlying infrastructure, simplifying deployment and scaling.

DockerfileDocker ImageDocker Container
Practice
Top 10 Most Popular Open-Source Databases
Databases & Storage

Top 10 Most Popular Open-Source Databases

Open-source databases provide flexibility and control over data management, offering solutions optimized for various consistency, scalability, and data modeling requirements. Selecting the right database involves evaluating trade-offs between SQL and NoSQL models, consistency levels, and operational overhead.

Open-sourceMySQLPostgreSQL
Practice
What does a typical microservice architecture look like
Architecture Patterns

What does a typical microservice architecture look like

Microservice architectures decompose large applications into independently deployable services, improving agility and resilience. Key considerations include inter-service communication, data consistency, and failure handling to maintain system stability.

MicroservicesAPI GatewayService Discovery
Practice
What is SSO (Single Sign-On)
Security

What is SSO (Single Sign-On)

Single Sign-On (SSO) centralizes authentication, allowing users to access multiple applications with a single login, improving security and user experience. It relies on secure token exchange between applications and a trusted authentication server, often using standardized protocols.

Single Sign-OnAuthenticationAuthorization
Practice
What makes HTTP2 faster than HTTP1
Networking & Protocols

What makes HTTP2 faster than HTTP1

HTTP2 addresses the limitations of HTTP1.1 by introducing binary framing and multiplexing, enabling multiple concurrent requests over a single TCP connection and significantly reducing latency. HPACK header compression further optimizes performance by minimizing redundant header data.

Binary FramingMultiplexingStream Prioritization
Practice
Log Parsing Cheat Sheet
CS Fundamentals

Log Parsing Cheat Sheet

Log parsing is essential for diagnosing application and infrastructure issues by extracting relevant information from unstructured text. Command-line tools and specialized log management systems enable efficient analysis, filtering, and aggregation of log data at scale.

grepcutsort
Practice
4 Ways Netflix Uses Caching to Hold User Attention
Caching Strategies

4 Ways Netflix Uses Caching to Hold User Attention

Netflix employs a multi-layered caching strategy to minimize latency and maximize throughput for a global user base. These caches range from in-memory key-value stores to geographically distributed CDNs, each optimized for specific data types and access patterns.

CachingEVCacheCDN
Practice
Top 6 Cases to Apply Idempotency
API Design

Top 6 Cases to Apply Idempotency

Idempotency ensures that an operation, when executed multiple times, yields the same outcome as a single execution, preventing unintended side effects. It's critical for building robust systems that can gracefully handle retries and failures in distributed environments, especially when network partitions occur.

IdempotencyRetriesUnique ID
Practice
MVC, MVP, MVVM, MVVM-C, and VIPER architecture patterns
Architecture Patterns

MVC, MVP, MVVM, MVVM-C, and VIPER architecture patterns

Architectural patterns like MVC, MVP, MVVM, MVVM-C, and VIPER address the challenge of organizing application code to enhance maintainability, testability, and separation of concerns. They provide different approaches for structuring the relationship between data (Model), user interface (View), and the logic that connects them, each with its own set of trade-offs.

MVCMVPMVVM
Practice
What are the differences among database locks
Databases & Storage

What are the differences among database locks

Database locks serialize access to data, preventing concurrent modifications that could lead to inconsistency. Different lock types offer varying degrees of concurrency and isolation, impacting system performance and data integrity.

Exclusive Lock (X-lock)Shared Lock (S-lock)Row-Level Lock
Practice
How do we Perform Pagination in API Design
API Design

How do we Perform Pagination in API Design

Pagination addresses the challenge of efficiently delivering large datasets through APIs by dividing the data into discrete, manageable chunks. Without it, API endpoints risk overwhelming clients and servers with excessive data transfer and processing.

PaginationOffset-based PaginationCursor-based Pagination
Practice
What happens when you type a URL into your browser
Networking & Protocols

What happens when you type a URL into your browser

Typing a URL triggers a complex sequence involving DNS resolution to obtain the server's IP address, followed by HTTP/HTTPS requests to retrieve content, and ultimately browser rendering. Caching and CDNs play crucial roles in optimizing this process for speed and efficiency.

URLDNSIP address
Practice
How do you pay from your digital wallet by scanning the QR code
Messaging & Communication

How do you pay from your digital wallet by scanning the QR code

QR code payments bridge the gap between physical point-of-sale systems and digital wallets by encoding transaction details within a scannable image, streamlining the payment process. The system relies on secure communication protocols and idempotent operations to ensure reliable and secure transactions.

QR codePSPPayment Gateway
Practice
What do Amazon, Netflix, and Uber have in common
Real-World Architectures

What do Amazon, Netflix, and Uber have in common

Amazon, Netflix, and Uber all operate at massive scale, demanding system designs that prioritize high availability and efficient resource utilization. They achieve this through a combination of stateless architectures, horizontal scaling, and asynchronous processing, allowing them to rapidly adapt to fluctuating user demand.

StatelessnessHorizontal ScalingLoad Balancing
Practice
100X Postgres Scaling at Figma
Real-World Architectures

100X Postgres Scaling at Figma

Figma achieved 100x Postgres scaling by combining vertical scaling, read replicas, connection pooling via PgBouncer, database proxies, and sharding to handle exponential growth. Their strategy involved both functional and horizontal partitioning to address performance bottlenecks at different stages.

Vertical ScalingRead ReplicasConnection Pooling
Practice
How to store passwords safely in the database and how to validate a password
Security

How to store passwords safely in the database and how to validate a password

Salting and hashing passwords using key derivation functions prevents attackers from using precomputed rainbow tables or brute-forcing common passwords. Choosing a strong algorithm and managing its parameters are critical for security and performance.

SaltingHashingKey Derivation Function (KDF)
Practice
Cybersecurity 101 in one picture
Security

Cybersecurity 101 in one picture

Cybersecurity ensures confidentiality, integrity, and availability of systems and data by employing layered defenses and continuous monitoring. A robust strategy balances preventative measures with proactive detection and rapid incident response to minimize risk and maintain operational resilience.

ConfidentialityIntegrityAvailability
Practice
What do version numbers mean
CS Fundamentals

What do version numbers mean

Version numbers provide an explicit contract between software producers and consumers about the nature and scope of changes in each release. Without versioning, updates risk breaking dependent systems due to unforeseen incompatibilities.

MAJORMINORPATCH
Practice
What is k8s (Kubernetes)
Cloud & DevOps

What is k8s (Kubernetes)

Kubernetes (k8s) addresses the operational complexity of deploying and managing distributed applications by providing a platform for automated container orchestration. Without k8s, managing the lifecycle of microservices at scale becomes a brittle, manual process prone to errors and downtime.

ContainersPodsNodes
Practice
HTTP Status Code You Should Know
Networking & Protocols

HTTP Status Code You Should Know

HTTP status codes are standardized numerical responses from servers, communicating the outcome of client requests. Proper understanding and handling of these codes are crucial for building resilient and observable distributed systems.

HTTP Status Codes200 OK302 Found
Practice
18 Most-used Linux Commands You Should Know
CS Fundamentals

18 Most-used Linux Commands You Should Know

Linux commands provide direct access to system resources and are essential for tasks ranging from file manipulation to performance monitoring. Proficiency in these commands enables efficient debugging, automation, and system administration.

CommandsFilesystemProcesses
Practice
Iterative, Agile, Waterfall, Spiral Model, RAD Model... What are the differences
CS Fundamentals

Iterative, Agile, Waterfall, Spiral Model, RAD Model... What are the differences

Software development methodologies represent distinct strategies for managing project lifecycles, each balancing predictability with adaptability. Selecting the right methodology hinges on understanding project constraints and embracing iterative feedback loops.

WaterfallAgileIterative
Practice
Design Patterns Cheat Sheet - Part 1 and Part 2
Architecture Patterns

Design Patterns Cheat Sheet - Part 1 and Part 2

Design patterns are codified solutions to recurring design challenges, promoting code reuse, maintainability, and scalability. They offer a common language and structured approach to address problems, enabling developers to build robust and adaptable systems.

FactoryBuilderPrototype
Practice
9 Essential Components of a Production Microservice Application
Messaging & Communication

9 Essential Components of a Production Microservice Application

Production microservices require a suite of supporting components to manage inter-service communication, data consistency, observability, and security. These components ensure resilience, scalability, and maintainability in a distributed environment.

API GatewayService RegistryKafka
Practice
Which latency numbers you should know
Networking & Protocols

Which latency numbers you should know

Understanding the typical latency orders of magnitude for common operations is critical for identifying performance bottlenecks during system design. These benchmarks inform decisions around caching, data storage, and inter-service communication protocols.

LatencyCacheRAM
Practice
API Gateway 101
API Design

API Gateway 101

An API Gateway decouples clients from backend services by providing a single entry point for requests, handling routing, authentication, and other cross-cutting concerns. Without it, you risk exposing internal architectures, creating tight coupling, and lacking centralized control over security and traffic management.

Reverse ProxyRequest RoutingAPI Orchestration
Practice
A Roadmap for Full-Stack Development
CS Fundamentals

A Roadmap for Full-Stack Development

Full-stack development requires understanding the interaction between front-end clients, back-end services, databases, and infrastructure. Optimizing this interaction involves trade-offs across latency, scalability, and data consistency, requiring a holistic view of the entire system.

Front-endBack-endDatabase
Practice
OAuth 2.0 Flows
Security

OAuth 2.0 Flows

OAuth 2.0 flows are distinct authorization protocols defining how applications request access to protected resources, each balancing security, usability, and implementation complexity. Selecting the appropriate flow is crucial for minimizing attack surfaces and adhering to the principle of least privilege.

Authorization codeAccess tokenRefresh token
Practice
10 Key Data Structures We Use Every Day
CS Fundamentals

10 Key Data Structures We Use Every Day

Data structures provide fundamental methods for organizing and accessing data, impacting performance and scalability. The optimal selection depends on balancing factors like access patterns, memory footprint, and the frequency of mutations.

Hash TableListQueue
Practice
Top 10 k8s Design Patterns
Cloud & DevOps

Top 10 k8s Design Patterns

Kubernetes design patterns offer reusable solutions to common distributed systems problems, promoting consistency and reliability. They codify best practices for managing deployments, scaling applications, and ensuring resilience in containerized environments.

Health probesLiveness probesReadiness probes
Practice
What is a Load Balancer
CS Fundamentals

What is a Load Balancer

Load balancers distribute network traffic across multiple backend servers to prevent overload and ensure high availability. They are essential for building scalable and resilient systems by decoupling client requests from specific server instances.

Layer 4 Load BalancingLayer 7 Load BalancingHealth Checks
Practice
8 Common System Design Problems and Solutions
Architecture Patterns

8 Common System Design Problems and Solutions

High-traffic systems encounter recurring challenges concerning latency, availability, and scalability. Common solutions involve caching strategies, redundancy techniques, asynchronous processing patterns, and data partitioning approaches to achieve resilient and performant architectures.

CachingRedisMemcached
Practice
How does SSH work
Networking & Protocols

How does SSH work

SSH provides a secure, encrypted channel for remote access and data transfer between two networked devices. It solves the problem of exposing sensitive data to eavesdropping or tampering when communicating over insecure networks.

EncryptionAuthenticationSecure Channel
Practice
Why is Nginx so popular
Networking & Protocols

Why is Nginx so popular

Nginx's popularity stems from its ability to efficiently manage network traffic through an event-driven architecture, functioning as a web server, reverse proxy, and load balancer. This allows it to handle high concurrency and optimize resource utilization, making it a foundational component in modern web infrastructure.

Web serverReverse proxyLoad balancing
Practice
How Discord Stores Trillions of Messages
Databases & Storage

How Discord Stores Trillions of Messages

Discord's message storage architecture evolved to handle immense scale, moving from MongoDB to Cassandra and ultimately to ScyllaDB to optimize for low latency and high throughput. This transition highlights the trade-offs between consistency, availability, and performance in distributed database systems.

DatabaseScalingLatency
Practice
How does Garbage Collection work
CS Fundamentals

How does Garbage Collection work

Garbage collection automatically reclaims memory occupied by objects no longer in use, preventing memory leaks and improving application performance. Different algorithms offer varied trade-offs between throughput, latency, and memory footprint, impacting application responsiveness and resource utilization.

memory managementautomatic cleanupmark-and-sweep
Practice
A Cheat Sheet for Designing Fault-Tolerant Systems
Cloud & DevOps

A Cheat Sheet for Designing Fault-Tolerant Systems

Fault-tolerant systems maintain functionality despite component failures, ensuring high availability and data integrity. They employ redundancy, replication, and automated failover mechanisms to minimize downtime and data loss.

ReplicationRedundancyLoad Balancing
Practice
If you don’t know trade-offs, you DON'T KNOW system design
Architecture Patterns

If you don’t know trade-offs, you DON'T KNOW system design

Architectural decisions are rarely clear-cut; they involve navigating competing constraints like cost, latency, consistency, and operational complexity. Understanding these trade-offs, and their implications, is fundamental to effective system design.

Trade-offsLatency vs ThroughputConsistency vs Availability
Practice
8 Tips for Efficient API Design
API Design

8 Tips for Efficient API Design

Efficient API design prevents cascading failures and performance bottlenecks by establishing clear contracts and predictable behavior between services. Well-designed APIs prioritize usability, scalability, and security, crucial for maintaining a healthy microservices ecosystem or enabling reliable third-party integrations.

RESTHTTP MethodsVersioning
Practice
The Ultimate Kafka 101 You Cannot Miss
Messaging & Communication

The Ultimate Kafka 101 You Cannot Miss

Kafka addresses the challenge of reliably and efficiently transporting high-volume, real-time data streams between disparate systems. Without a robust messaging system like Kafka, applications struggle to maintain performance and consistency under heavy load, leading to data loss and system instability.

MessagesTopicsPartitions
Practice
A Cheatsheet for UML Class Diagrams
API Design

A Cheatsheet for UML Class Diagrams

UML class diagrams provide a standardized visual language for modeling the static structure of object-oriented systems, enabling clear communication and design validation. These diagrams depict classes, their attributes and methods, and the relationships between them, facilitating the creation of maintainable and scalable software architectures.

ClassAttributesMethods
Practice
20 Popular Open Source Projects Started or Supported By Big Companies
Real-World Architectures

20 Popular Open Source Projects Started or Supported By Big Companies

Large companies often open-source internal tools to drive adoption, cultivate developer ecosystems, and establish de facto standards. Understanding the motivations and trade-offs behind these projects is crucial for system design and technology selection.

Open SourceKubernetesReact
Practice
A Crash Course on Database Sharding
Databases & Storage

A Crash Course on Database Sharding

Database sharding horizontally partitions data across multiple independent database instances to improve query performance, write throughput, and overall availability. Effective sharding requires careful consideration of data distribution, query patterns, and operational complexity.

ShardingShardsRange-based sharding
Practice
Is PostgreSQL eating the database world
Databases & Storage

Is PostgreSQL eating the database world

PostgreSQL's extensibility and SQL compliance address the tension between needing a specialized database for every workload and the operational overhead of managing many database systems. Its robust feature set and extension ecosystem allow it to serve as a versatile data platform for diverse application needs.

ExtensibilityPostGISJSONB
Practice
The Ultimate Software Architect Knowledge Map
Architecture Patterns

The Ultimate Software Architect Knowledge Map

A software architect knowledge map is a structured approach to understanding the breadth and depth of technical skills required to design, build, and maintain complex software systems. It helps engineers identify skill gaps and prioritize learning across diverse domains like programming, design patterns, infrastructure, and security.

Programming LanguagesDesign PrinciplesArchitectural Patterns
Practice
A Crash Course on Scaling the Data Layer
Databases & Storage

A Crash Course on Scaling the Data Layer

Data layer scaling introduces challenges like overwhelming databases with sudden spikes in traffic after cache failures or when dealing with non-existent data. Mitigation strategies involve techniques to dampen traffic spikes, prevent repeated lookups of missing data, and provide fallback mechanisms during cache outages.

Thundering HerdCache PenetrationCache Crash
Practice
4 Popular GraphQL Adoption Patterns
Networking & Protocols

4 Popular GraphQL Adoption Patterns

GraphQL adoption patterns address the trade-offs between ease of implementation, performance, and organizational complexity when introducing GraphQL into existing architectures. Choosing the right pattern depends on factors like team structure, application scale, and existing infrastructure.

GraphQLGraphQL SchemaClient-based GraphQL
Practice
Top 8 Popular Network Protocols
Networking & Protocols

Top 8 Popular Network Protocols

Network protocols are sets of rules governing data exchange between devices. Choosing the correct protocol impacts system performance, reliability, and security; understanding their trade-offs is essential for system design.

HTTP/HTTPSTCP/UDPSMTP
Practice
11 Things I learned about API Development from POST/CON 2024 by Postman
Real-World Architectures

11 Things I learned about API Development from POST/CON 2024 by Postman

API development is shifting towards developer experience improvements and performance gains, emphasizing visual design tools, enhanced collaboration features, and modern protocols. This evolution allows developers to construct more efficient and maintainable APIs.

API workflowsAPI monitoringAPI collaboration
Practice
How do Search Engines really Work
Cloud & DevOps

How do Search Engines really Work

Search engines tackle the challenge of efficiently retrieving relevant information from a massive, constantly evolving corpus of web pages. They achieve this through crawling, indexing, and ranking, balancing scale, speed, and accuracy.

CrawlingIndexingRanking
Practice
The Ultimate Walkthrough of the Generative AI Landscape
Real-World Architectures

The Ultimate Walkthrough of the Generative AI Landscape

Generative AI leverages large models to create novel content, addressing the need for automated content creation and personalized experiences. Effective deployment requires careful consideration of model serving, resource optimization, and monitoring to meet latency and throughput demands.

Generative AILarge Language Models (LLMs)Transformer Architecture
Practice
Cheatsheet on Relational Database Design
Databases & Storage

Cheatsheet on Relational Database Design

Relational database design centers on structuring data into normalized tables with well-defined relationships, optimizing for consistency, querying efficiency, and scalability. Poor design leads to data redundancy, slow queries, and difficulties in scaling the database to handle growing data volumes or user traffic.

Relational DatabaseNormalizationACID Properties
Practice
My Favorite 10 Soft Skill Books that Can Help You Become a Better Developer
Real-World Architectures

My Favorite 10 Soft Skill Books that Can Help You Become a Better Developer

Effective communication is critical for software engineers, enabling clear articulation of technical concepts, collaborative problem-solving, and efficient team coordination. Mastering these soft skills improves overall system design, troubleshooting, and team performance, especially in distributed environments.

CommunicationCollaborationTroubleshooting
Practice
REST API Authentication Methods
Security

REST API Authentication Methods

REST API authentication verifies a client's identity before granting access to resources. Choosing the right method balances security, complexity, and performance, impacting API usability and resilience against attacks.

AuthenticationAuthorizationAPI Keys
Practice
The Evolving Landscape of API Protocols
Networking & Protocols

The Evolving Landscape of API Protocols

API protocols define the structure and rules for data exchange between systems. Selecting the appropriate protocol is critical for optimizing performance, managing complexity, and ensuring interoperability across diverse applications.

RESTGraphQLWebSockets
Practice
Design a Rate Limiter
Architecture Patterns

Design a Rate Limiter

Rate limiters protect services from being overwhelmed by excessive requests, ensuring availability and preventing abuse by controlling the rate at which clients can access resources. They achieve this by tracking request counts and rejecting requests that exceed predefined thresholds.

Rate LimitingToken BucketLeaky Bucket
Practice
Design a URL Shortener
Architecture Patterns

Design a URL Shortener

A URL shortener translates long URLs into shorter, more manageable aliases, improving usability and shareability, while requiring a robust system to handle mapping, redirection, and scaling under high traffic. Efficient encoding schemes, distributed storage, and caching mechanisms are essential for optimal performance and availability.

Base62 encodingHashing algorithmsHTTP redirects (301, 302)
Practice
Consistent Hashing
CS Fundamentals

Consistent Hashing

Consistent hashing distributes data across a cluster such that adding or removing nodes minimizes key remapping, improving cache hit rates and reducing operational overhead. It's a fundamental technique for building scalable and fault-tolerant distributed systems.

HashingDistributed SystemsData Partitioning
Practice
CAP Theorem Deep Dive
Databases & Storage

CAP Theorem Deep Dive

CAP Theorem states that a distributed system can only guarantee two out of three properties: Consistency, Availability, and Partition Tolerance. Understanding the trade-offs between these properties is crucial for designing robust and scalable systems.

CAP TheoremConsistencyAvailability
Practice
Content Delivery Networks (CDN) - How They Work
Networking & Protocols

Content Delivery Networks (CDN) - How They Work

Content Delivery Networks (CDNs) alleviate latency and improve availability by caching content closer to users, reducing the load on origin servers. They achieve this through a distributed network of servers that deliver content based on geographic proximity and server load.

Content Delivery Network (CDN)Cache HitCache Miss
Practice
Design a Chat / Messaging System
Real-World Architectures

Design a Chat / Messaging System

Designing a chat system involves balancing real-time message delivery with reliability, scalability, and features like presence and history. Common architectures involve a combination of load balancers, message queues, and persistent storage to handle high throughput and ensure message durability.

WebSocketsSSE (Server-Sent Events)Message Queues (Kafka, RabbitMQ)
Practice
Design a News Feed / Timeline
Real-World Architectures

Design a News Feed / Timeline

A news feed aggregates content from users and entities a user follows, ranking and presenting it in a timely and engaging manner. Effective feed design balances freshness, relevance, and system performance under high load.

Fan-outPush vs PullMessage Queue (Kafka)
Practice
Design a Web Crawler
Architecture Patterns

Design a Web Crawler

A web crawler systematically browses the World Wide Web, indexing content by following hyperlinks and storing the data for later retrieval. Effective crawler design requires balancing breadth-first exploration with politeness constraints and efficient storage.

HTTPHTML parsingrobots.txt
Practice
Design a Distributed Key-Value Store
Databases & Storage

Design a Distributed Key-Value Store

A distributed key-value store provides scalable and fault-tolerant data storage by partitioning data across multiple nodes and employing replication or erasure coding for redundancy, enabling high availability and throughput. Consistent hashing, data versioning, and conflict resolution mechanisms are crucial for maintaining data integrity and consistency across the distributed system.

Consistent HashingReplicationErasure Coding
Practice
Design Search Autocomplete / Typeahead
Real-World Architectures

Design Search Autocomplete / Typeahead

Search autocomplete, or typeahead, enhances user experience by predicting search queries as users type. It balances low latency, high relevance, and scalability through techniques like prefix matching, ranking algorithms, and distributed caching.

Trie data structurePrefix matchingRanking algorithms (TF-IDF)
Practice
Design a Distributed Message Queue
Messaging & Communication

Design a Distributed Message Queue

A distributed message queue decouples producers and consumers, enabling asynchronous communication and improved system resilience. It handles message persistence, delivery guarantees, and scaling challenges inherent in high-throughput, distributed systems.

Message BrokerProducerConsumer
Practice
Bloom Filters - Probabilistic Data Structures
CS Fundamentals

Bloom Filters - Probabilistic Data Structures

Bloom filters are space-efficient probabilistic data structures used to test whether an element is a member of a set, accepting a small probability of false positives but guaranteeing no false negatives. They are commonly employed in caching, networking, and database systems to reduce unnecessary lookups and improve performance.

Bloom filterHash functionBit array
Practice
Gossip Protocol in Distributed Systems
Networking & Protocols

Gossip Protocol in Distributed Systems

Gossip protocols enable efficient and reliable information dissemination across large, decentralized systems by probabilistically propagating updates between nodes, trading immediate consistency for eventual consistency and high availability. They are crucial for maintaining state in environments where centralized coordination is impractical or undesirable.

Gossip protocolEpidemic protocolEventual consistency
Practice
Distributed Consensus - Raft and Paxos
CS Fundamentals

Distributed Consensus - Raft and Paxos

Distributed consensus algorithms like Raft and Paxos ensure that a group of machines agrees on a single value, even when some machines fail or the network is unreliable, providing fault tolerance and consistency in distributed systems. These algorithms are critical for building reliable distributed databases and coordination services.

Distributed ConsensusRaftPaxos
Practice
Leader Election in Distributed Systems
CS Fundamentals

Leader Election in Distributed Systems

Leader election ensures a single process acts as the coordinator in a distributed system, preventing conflicting actions and maintaining consistency; it's a fault-tolerance mechanism that automatically selects a new leader if the existing one fails.

RaftPaxosZooKeeper
Practice
Distributed Transactions - Two-Phase Commit and Saga Pattern
Databases & Storage

Distributed Transactions - Two-Phase Commit and Saga Pattern

Distributed transactions ensure atomicity and consistency across multiple services or databases, preventing partial failures that can corrupt data. Two-Phase Commit (2PC) and Saga are common patterns for achieving this, each with different trade-offs regarding consistency, latency, and complexity.

Distributed TransactionsAtomicityConsistency
Practice
CQRS - Command Query Responsibility Segregation
Architecture Patterns

CQRS - Command Query Responsibility Segregation

CQRS separates read and write operations into distinct models, optimizing each independently to improve performance, scalability, and security. It prevents a single data model from becoming a bottleneck by tailoring data access patterns to specific use cases.

Command Query Responsibility Segregation (CQRS)Command ModelQuery Model
Practice
Back-of-the-Envelope Estimation for System Design
CS Fundamentals

Back-of-the-Envelope Estimation for System Design

Back-of-the-envelope estimation provides a method to quickly approximate system resource requirements, performance bottlenecks, and feasibility, enabling informed architectural decisions and proactive scaling strategies. It involves using reasonable assumptions and order-of-magnitude calculations to avoid costly design flaws and ensure the system can handle expected load.

Requests per second (RPS)Data throughputLatency
Practice
Service Discovery in Microservices
Architecture Patterns

Service Discovery in Microservices

Service discovery in microservices enables applications to automatically locate and communicate with other services within a distributed system, adapting to changes in network locations and scaling without manual configuration. It prevents cascading failures and ensures resilience in dynamic environments.

Service RegistryConsuletcd
Practice
Encoding vs Encryption vs Tokenization
Security

Encoding vs Encryption vs Tokenization

Encoding transforms data for compatibility, encryption secures data confidentiality, and tokenization replaces sensitive data with non-sensitive surrogates; each serves distinct security and operational purposes within a system.

EncodingBase64URL encoding
Practice
Design a Video Streaming Platform
Real-World Architectures

Design a Video Streaming Platform

Designing a video streaming platform involves trade-offs between latency, cost, and quality, using techniques like content delivery networks (CDNs), adaptive bitrate streaming, and efficient encoding to deliver video content to a global audience reliably and efficiently. The architecture must handle ingestion, transcoding, storage, and delivery, while optimizing for user experience and scalability.

Content Delivery Network (CDN)Adaptive Bitrate Streaming (ABR)HLS (HTTP Live Streaming)
Practice
Circuit Breaker Pattern
Architecture Patterns

Circuit Breaker Pattern

The Circuit Breaker pattern prevents cascading failures in distributed systems by stopping requests to failing services, allowing them time to recover. It provides fault tolerance and resilience by acting as a proxy that monitors service health and intervenes when thresholds are exceeded.

Circuit Breaker PatternCascading FailuresFault Tolerance
Practice
Database Indexing - B-Trees, LSM Trees, and Beyond
Databases & Storage

Database Indexing - B-Trees, LSM Trees, and Beyond

Database indexes accelerate data retrieval by creating sorted lookups on specific columns, but introduce write performance overhead. Choosing the right indexing strategy, such as B-trees or LSM trees, depends on the read/write workload characteristics of the application.

B-treeLSM treeSSTable
Practice
Design a Proximity Service
Real-World Architectures

Design a Proximity Service

Designing a proximity service, like Yelp's nearby search, involves efficiently identifying businesses within a user-defined radius. The core challenge lies in balancing accuracy and speed when searching through vast datasets of geographic locations.

Spatial IndexingGeohashingGeospatial Data
Practice
Design a Nearby Friends System
Real-World Architectures

Design a Nearby Friends System

Designing a 'Nearby Friends' system presents a significant challenge due to the dynamic nature of user location data and the need to efficiently query and update this information for a large user base. The system must provide low-latency responses while handling a high volume of location updates and proximity-based queries.

Geolocation databaseSpatial indexingGeohashes
Practice
Design Google Maps
Real-World Architectures

Design Google Maps

Designing Google Maps involves building a complex system capable of providing accurate real-time location data, navigation, and map rendering to a massive user base. The challenge lies in handling massive data volumes, ensuring low latency, and maintaining accuracy across diverse geographic regions and user devices.

GeocodingReverse GeocodingWeb Mercator Projection
Practice
Design a Hotel Reservation System
Real-World Architectures

Design a Hotel Reservation System

Designing a robust hotel reservation system presents the challenge of managing inventory availability, pricing fluctuations, and concurrent user access, all while ensuring data consistency and fault tolerance. The complexity stems from handling a high volume of read and write operations, especially during peak seasons, and accommodating features like overbooking and cancellation policies.

API GatewayMicroservices ArchitectureDatabase Sharding
Practice
Design S3-like Object Storage
Databases & Storage

Design S3-like Object Storage

Designing an S3-like object storage system presents the challenge of managing vast amounts of unstructured data with high durability, scalability, and availability. This necessitates a distributed architecture that optimizes for cost-effectiveness and eventual consistency, while providing a user-friendly API.

Object StorageRESTful APIEventual Consistency
Practice
Design a Real-time Gaming Leaderboard
Real-World Architectures

Design a Real-time Gaming Leaderboard

Designing a real-time gaming leaderboard presents a significant challenge due to the high volume of updates, read requests, and the need for low latency to maintain a competitive and engaging user experience. This system requires careful consideration of data structures, storage solutions, and caching strategies to ensure scalability and responsiveness.

Real-time Data ProcessingIn-Memory DatabaseSorted Sets
Practice
Design a Stock Exchange System
Real-World Architectures

Design a Stock Exchange System

Designing a stock exchange system presents significant challenges due to its stringent requirements for low latency, high throughput, and robustness. The core difficulty lies in efficiently matching buy and sell orders while adhering to regulatory requirements and risk management protocols, all under intense load.

Order Matching EngineOrder BookLimit Order
Practice
RAG - Retrieval Augmented Generation
AI & GenAI Engineering

RAG - Retrieval Augmented Generation

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by grounding their responses in external knowledge sources. This prevents hallucinations and allows LLMs to access up-to-date and domain-specific information crucial for real-world applications.

Retrieval-Augmented Generation (RAG)Knowledge BaseIndexing
Practice
Advanced RAG Patterns
AI & GenAI Engineering

Advanced RAG Patterns

Index-Aware Retrieval enhances RAG by leveraging knowledge of chunk content and indexing methodology to address limitations in retrieving relevant information. It optimizes retrieval in scenarios where queries lack direct keyword matches, involve nuanced details, or require logical reasoning across multiple chunks.

RAG (Retrieval-Augmented Generation)Semantic IndexingQuery Expansion
Practice
Chain of Thought Reasoning
AI & GenAI Engineering

Chain of Thought Reasoning

Chain of Thought (CoT) prompting is a technique to enhance LLMs' reasoning by explicitly prompting them to break down complex problems into intermediate reasoning steps before providing the final answer. It's crucial for production AI systems to solve complex, multi-step problems that foundational models struggle with due to limited training data coverage.

Chain of Thought (CoT)Prompt EngineeringLarge Language Models (LLMs)
Practice
Prompt Engineering and Optimization
AI & GenAI Engineering
Pro

Prompt Engineering and Optimization

Prompt engineering and optimization techniques like dataset distribution awareness, neutralization, and content optimization are crucial for building reliable and effective production AI systems. These methods ensure fairness, mitigate bias, and maximize the impact of generated content.

Dataset DistributionBias MitigationNeutralization
Practice
Structured Output and Content Control
AI & GenAI Engineering
Pro

Structured Output and Content Control

Structured Output and Content Control are critical for ensuring that AI-generated content meets predefined specifications and adheres to desired styles. These techniques are essential for building robust production AI systems that reliably generate content suitable for specific use cases, meeting branding, compliance, and accuracy requirements.

Logits MaskingStyle TransferReverse Neutralization
Practice
Tool Calling and Function Use
AI & GenAI Engineering
Pro

Tool Calling and Function Use

Tool Calling empowers Large Language Models (LLMs) to interact with external systems by invoking functions and APIs. This capability is crucial for building production-ready AI agents that can perform real-world tasks beyond content generation.

LLMAPIFunction Calling
Practice
Multiagent Collaboration
AI & GenAI Engineering
Pro

Multiagent Collaboration

Multiagent collaboration is an architectural pattern where multiple specialized AI agents work together to solve complex problems, surpassing the limitations of single large language models (LLMs). This approach is critical for production AI systems requiring complex reasoning, domain-specific expertise, and efficient resource utilization.

Agent SpecializationTask DecompositionParallel Processing
Practice
Composable Agentic Workflows
AI & GenAI Engineering
Pro

Composable Agentic Workflows

Composable Agentic Workflows represent a modular approach to building AI-powered applications by chaining together specialized agents, offering flexibility and maintainability crucial for production environments. Instead of monolithic systems, this methodology leverages reusable patterns for scalable and adaptable AI solutions.

Agentic WorkflowsComposabilityModularity
Practice
LLM-as-Judge and Evaluation
AI & GenAI Engineering
Pro

LLM-as-Judge and Evaluation

The LLM-as-Judge pattern utilizes large language models to automate and enhance the evaluation of other AI systems' outputs, offering a scalable and customizable alternative to traditional metrics and human review. This is crucial for reliable production AI, providing nuanced feedback for model improvement and validation without extensive human involvement.

LLM-as-JudgeAutomated EvaluationScalable Evaluation
Practice
Fine-Tuning and Adapter Patterns
AI & GenAI Engineering
Pro

Fine-Tuning and Adapter Patterns

Fine-tuning and Adapter patterns are efficient techniques for adapting large language models (LLMs) to specific tasks or datasets with limited computational resources. They offer a practical alternative to full fine-tuning, providing a balance between performance and cost-effectiveness for production AI systems.

Fine-tuningAdapter tuningPre-trained language model
Practice
AI Guardrails and Safety
AI & GenAI Engineering
Pro

AI Guardrails and Safety

Template Generation is a crucial AI safety pattern that mitigates risks associated with unpredictable LLM outputs by using pre-approved and reviewed templates for specific tasks. This approach significantly reduces the need for real-time human review while still leveraging the creative abilities of LLMs during the template creation phase.

LLM SafetyDeterministic OutputHuman-in-the-Loop
Practice
Small Language Models
AI & GenAI Engineering
Pro

Small Language Models

Small Language Models (SLMs) offer a cost-effective and efficient alternative to large language models (LLMs) for specific tasks. By employing techniques like distillation, quantization, and speculative decoding, SLMs can be deployed on resource-constrained infrastructure without significant performance degradation, making them crucial for production AI systems.

DistillationQuantizationSpeculative Decoding
Practice
LLM Inference Optimization
AI & GenAI Engineering
Pro

LLM Inference Optimization

LLM inference optimization focuses on techniques that improve the speed and efficiency of deploying large language models in production, addressing key constraints like latency, cost, and hardware utilization. These optimizations are crucial for creating responsive and scalable AI applications.

Inference optimizationKnowledge distillationQuantization
Practice
Long-Term Memory for AI Agents
AI & GenAI Engineering
Pro

Long-Term Memory for AI Agents

Long-Term Memory (LTM) is the ability of AI agents to persist and recall information across multiple interactions, overcoming the stateless nature of LLMs. It's crucial for building production-grade AI applications that offer personalized, context-aware experiences and handle complex tasks requiring historical data.

Vector DatabaseKnowledge GraphSemantic Caching
Practice
Dependency Injection for LLM Reliability
AI & GenAI Engineering
Pro

Dependency Injection for LLM Reliability

Dependency Injection (DI) is a software design pattern crucial for developing reliable and testable Large Language Model (LLM) applications. It promotes modularity and simplifies testing by decoupling components and allowing for easy substitution of dependencies with mock implementations, vital for handling LLM non-determinism and evolving models.

Dependency InjectionLLM ChainsMock Implementations
Practice