How to Scale Like a Senior Engineer | System Design Guide

How to Scale Like a Senior Engineer | System Design Guide

How to Scale Like a Senior Engineer

A Comprehensive Guide to Scaling Systems: Servers, Databases, Load Balancers, and SPOFs

What We Need to Know

  • How to scale the system
  • How to increase the reliability of a system
  • How to increase its Availability
  • What are SLO / SLA

Understanding SLO and SLA

SLA (Service Level Agreement)

A formal agreement between a service provider and customer that defines the expected level of service, including:

  • Uptime guarantees (e.g., 99.9% availability)
  • Response time commitments
  • Consequences if service levels are not met
  • Legal and financial implications

SLA Examples:

Example 1: Availability SLA
SLA: 99.9% uptime (Three 9’s)
Allowed downtime per year: 8.76 hours
Allowed downtime per month: 43.2 minutes

Example 2: Response Time SLA
SLA: 95% of requests must respond within 200ms
SLA: 99% of requests must respond within 500ms

Example 3: Error Rate SLA
SLA: Error rate must be less than 0.1%
Out of 10,000 requests, maximum 10 can fail

Availability Percentage Examples:

AvailabilityDowntime/YearDowntime/MonthUse Case
99% (Two 9’s)87.6 hours7.2 hoursBasic websites
99.9% (Three 9’s)8.76 hours43.2 minutesE-commerce, SaaS
99.99% (Four 9’s)52.56 minutes4.32 minutesFinancial services
99.999% (Five 9’s)5.26 minutes25.9 secondsMission-critical systems

Real-World SLA Example:

E-commerce Platform SLA:
Availability: 99.9% uptime guarantee
Response Time: 95% of API calls respond within 200ms
Error Rate: Less than 0.1% error rate
Penalty: If SLA is breached, customer receives 10% service credit
Measurement: Monitored 24/7 with automated alerts

SLO (Service Level Objective)

Internal targets that teams use to measure the performance of their services:

  • Measurable goals for service performance
  • Used internally to ensure SLA compliance
  • More aggressive than SLA targets (buffer for SLA)
  • Helps teams stay ahead of SLA commitments

SLO Examples:

Example 1: Uptime SLO vs SLA
SLA Commitment: 99.9% uptime
Internal SLO: 99.95% uptime (more strict)
Why? The 0.05% buffer ensures we meet SLA even with unexpected issues

Example 2: Response Time SLO vs SLA
SLA: 95% of requests within 200ms
Internal SLO: 99% of requests within 150ms
Why? Internal target is faster to ensure SLA compliance

Example 3: Error Rate SLO vs SLA
SLA: Error rate less than 0.1%
Internal SLO: Error rate less than 0.05%
Why? Lower internal target provides safety margin

Real-World SLO Example:

API Service SLOs:
Availability SLO: 99.95% (SLA is 99.9%)
Latency SLO: P95 latency < 150ms (SLA is P95 < 200ms)
Error Rate SLO: < 0.05% (SLA is < 0.1%)
Throughput SLO: Handle 10,000 requests/second

These internal targets ensure the team always meets customer-facing SLAs.

Key Relationship:

SLO > SLA (SLOs are more strict than SLAs)

Formula: SLO = SLA + Safety Buffer

Example:
If your SLA is 99.9% availability, set your SLO to 99.95%
This gives you a 0.05% buffer to handle unexpected issues
while still meeting your SLA commitments to customers.

For Seniors

Need to know the foundation of system design. How to design system or feature from scratch.

Objectives & Learning Goals

Learning Objectives:

  • Setup single servers
  • Scale to multiple replicas
  • Understand databases
  • Vertical and Horizontal scaling
  • Load balancer implementation
  • Health check mechanisms
  • Single point of failure (SPOF) identification and mitigation

Designing Systems for Millions of Users

Every complex system starts with a simple foundation. Starting small allows us to understand each core component before adding complexity.

First, We Start with Simple One User

Starting small allows us to understand each core component before adding complexity.

Step 1: Build a Single Server Setup

Setup for small user base where everything runs on a single server (web, db, cache).

Single Server Architecture
Users Web/Mobile Single Server Web App Database Cache Business Logic

Step 2: Understanding the Request Flow

This server handles business logic, data storage, and presentation. Request flow for both web and mobile applications.

GET /products/:id - Retrieve details for product ID = 456

Example response:
Request Flow Diagram
Client Browser/App HTTP Request Server Business Logic Data Processing Presentation Query Database Data Storage Retrieval Processing Data Response HTTP Response

A single server might fall short as user demand increases.

Key Takeaways:

  • Start small – Begin with a straightforward single-server setup
  • Request flow – Understanding how requests flow through your system
  • Traffic sources – Web and mobile applications

AWS Deployment

  1. First, select the server
  2. Then SSH to server
  3. Install the necessary dependencies according to the application requirements
  4. Clone the repo from GitHub
  5. It’s the same as you do on your local machine
  6. Your server running port – expose it to AWS security group
  7. Then check with public IP address of the server: public-ip:server-port

Example: http://172.23.34.5:3000

Now you can see the application running.

Here is the application you can use for your demo: https://github.com/m-saad-siddique/simple-app

Using Application Servers

You can also use app servers for different purposes. For Node.js applications, use PM2 and Nginx:

  • PM2 – Process manager for Node.js applications that keeps applications alive forever, reloads them without downtime, and facilitates common system admin tasks
  • Nginx – Web server that can also be used as a reverse proxy, load balancer, and HTTP cache

How to do this: Configure PM2 to manage your Node.js processes and use Nginx as a reverse proxy to route traffic to your application.

Scaling Beyond Single Server

As the user base grows, a single server is not enough to handle the requests. To accommodate growth, we separate:

  • Web Tier – For web and mobile requests
  • Data Tier – For data requests from the web tier server

This separation is crucial for managing the load effectively.

Multi-Tier Architecture
Users Web & Mobile Clients Load Balancer Traffic Distribution Health Checks Round Robin Web Tier Server 1 Business Logic API Layer Web Tier Server 2 Business Logic API Layer Web Tier Server 3 Business Logic API Layer Data Tier Database Cluster Primary + Replicas Data Storage & Retrieval

Caching Strategies

Caching is a critical technique for improving system performance and reducing database load. By storing frequently accessed data in fast, temporary storage, we can dramatically reduce response times and system load.

What is Caching?

Caching involves storing copies of frequently accessed data in a faster storage layer (memory) to reduce the need to access slower storage layers (disk/database). This significantly improves response times and reduces load on backend systems.

Cache Layers

L1 Cache (Application Cache)

  • In-memory cache within the application
  • Fastest access (nanoseconds)
  • Limited by application memory
  • Examples: Local variables, in-process cache

L2 Cache (Distributed Cache)

  • External cache service shared across servers
  • Very fast access (microseconds)
  • Can be scaled independently
  • Examples: Redis, Memcached, Hazelcast

L3 Cache (CDN/Edge Cache)

  • Geographically distributed cache
  • Fast access from edge locations (milliseconds)
  • Best for static content and global distribution
  • Examples: CloudFront, Cloudflare, Fastly
Multi-Layer Caching Architecture with Cache Hit/Miss Flow
Request Flow Through Cache Layers Client Browser/App User Request Request L3: CDN Cache Edge Locations Static Assets CloudFront/Cloudflare Cache Hit Return to Client Cache Miss Load Balancer Traffic Routing Health Checks App Server 1 L1 Cache In-Memory App Server 2 L1 Cache In-Memory L1 Cache Miss L2: Distributed Cache Redis Cluster Memcached Distributed Hazelcast Data Grid Shared across all application servers Cache Hit L2 Cache Miss Database Layer Primary DB Read Replicas Backup DB Slowest Access – Persistent Storage Disk I/O – Highest Latency Response Store in L2 Store in L1 Store in L3 Performance Metrics L3 (CDN): ~10-50ms L1 (App): ~1-10μs (nanoseconds) L2 (Distributed): ~1-5ms Database: ~10-100ms Cache Scenarios ✓ Cache Hit: Data found in cache – Fast response ✗ Cache Miss: Data not in cache – Query next layer Cache Layers ● L3: CDN Cache Edge locations, static content ● L1: Application Cache In-memory, fastest access ● L2: Distributed Cache Shared across servers (Redis/Memcached)

Cache Patterns

1. Cache-Aside (Lazy Loading)

How it works:

  1. Application checks cache first
  2. If cache miss, fetch from database
  3. Store result in cache for future requests
  4. Return data to client

Pros: Simple, cache only contains requested data

Cons: Cache miss penalty, potential for stale data

2. Write-Through

How it works:

  1. Write data to cache and database simultaneously
  2. Both are always in sync
  3. Reads are always from cache

Pros: Data consistency, no stale data

Cons: Higher write latency, more writes to database

3. Write-Behind (Write-Back)

How it works:

  1. Write to cache immediately
  2. Write to database asynchronously later
  3. Better write performance

Pros: Fast writes, reduced database load

Cons: Risk of data loss if cache fails, eventual consistency

4. Refresh-Ahead

How it works:

  1. Cache proactively refreshes before expiration
  2. Reduces cache miss rate
  3. Predicts future access patterns

Pros: Lower cache miss rate, better user experience

Cons: More complex, may refresh unused data

Cache Invalidation Strategies

  • TTL (Time To Live) – Cache expires after a set time period
  • Event-based invalidation – Invalidate cache when data changes
  • Manual invalidation – Explicitly clear cache when needed
  • Version-based – Use version numbers to invalidate stale data

When to Use Caching

Good Candidates for Caching:

  • Frequently accessed data
  • Expensive database queries
  • Static or semi-static content
  • Computed results that don’t change often
  • Session data
  • User preferences and settings

Not Good for Caching:

  • Frequently changing data
  • Real-time data requirements
  • Large objects that don’t fit in memory
  • Data that requires strong consistency
  • Sensitive data (unless encrypted)

Popular Caching Solutions

  • Redis – In-memory data structure store, supports persistence, pub/sub, and complex data types
  • Memcached – Simple, high-performance distributed memory caching system
  • Hazelcast – In-memory data grid with distributed computing capabilities
  • Amazon ElastiCache – Managed Redis/Memcached service on AWS
  • CDN Services – CloudFront, Cloudflare, Fastly for edge caching

Cache Eviction Policies

When cache is full, these policies determine what to remove:

  • LRU (Least Recently Used) – Remove least recently accessed items
  • LFU (Least Frequently Used) – Remove least frequently accessed items
  • FIFO (First In First Out) – Remove oldest items first
  • Random – Randomly select items to evict
  • TTL-based – Remove expired items first

Choosing the Right Database

Two main options: RDS (Relational Databases) | NoSQL Databases

Relational Databases

  • Data consistency and integrity
  • Especially for transactions (ACID)
  • Data is well-structured with clear relations
  • Strong consistency and transactional integrity

Advantages of Relational Databases:

  • ACID compliance ensures data integrity
  • Well-established and mature technology
  • Strong consistency guarantees
  • Excellent for complex queries and joins
  • Standard SQL language

Disadvantages of Relational Databases:

  • Vertical scaling limitations
  • Can be slower for large-scale reads
  • Schema changes can be complex
  • May not handle unstructured data well

NoSQL Databases

  • Super low latency for rapid response
  • Data in unstructured or semi-structured format
  • Scalable storage for massive data volumes

Advantages of NoSQL Databases:

  • Horizontal scaling capabilities
  • Flexible schema design
  • High performance for large volumes
  • Better suited for distributed systems
  • Can handle various data types

Disadvantages of NoSQL Databases:

  • Weaker consistency guarantees
  • Limited query capabilities compared to SQL
  • Less mature ecosystem
  • May require more application-level logic

Questions to Ask When Choosing a Database:

  • What is the data structure? (Structured, semi-structured, or unstructured?)
  • What are the consistency requirements? (Strong consistency vs eventual consistency)
  • What is the expected read/write ratio? (Read-heavy vs write-heavy workloads)
  • What is the scale requirement? (Expected data volume and growth)
  • What are the transaction requirements? (Do you need ACID compliance?)
  • What is the query pattern? (Simple lookups vs complex joins)
  • What is the latency requirement? (Real-time vs batch processing)

Vertical (Scale Up) vs Horizontal Scaling (Scale Out)

Vertical Scaling (Scale Up)

Adding more power (CPU, RAM, storage) to existing servers.

Pros of Vertical Scaling:

  • Simpler to implement – no architectural changes needed
  • No code changes required
  • Easier to manage – single server
  • No need for load balancing
  • Better for applications that can’t be distributed

Cons of Vertical Scaling:

  • Limited by hardware constraints
  • Higher costs for powerful hardware
  • Single point of failure
  • Downtime required for upgrades
  • Cannot scale beyond maximum hardware capacity

Horizontal Scaling (Scale Out)

Adding more servers to handle increased load.

Pros of Horizontal Scaling:

  • Nearly unlimited scaling potential
  • Better fault tolerance – if one server fails, others continue
  • Cost-effective – use commodity hardware
  • No downtime for scaling
  • Better performance distribution

Cons of Horizontal Scaling:

  • More complex to manage multiple servers
  • Requires load balancing infrastructure
  • Potential data consistency challenges
  • May require application redesign
  • Network complexity increases

In Horizontal Scaling: How to Handle Client Requests

When using horizontal scaling, we need to determine which server should serve each client request. This is handled by load balancers that distribute incoming requests across multiple servers based on various algorithms.

Application Suitability

Which Applications Are NOT Suitable for Horizontal Scaling?

Applications with strong state requirements, complex inter-server dependencies, or applications that require shared memory or file systems.

Examples of Applications NOT Suitable for Horizontal Scaling:

1. Legacy Monolithic Applications with In-Memory State

  • Example: Old Java applications storing user sessions in server memory
  • Why: Session data is tied to specific server instance
  • Solution: Refactor to use distributed session storage (Redis) or stateless design

2. Applications Using Shared File Systems

  • Example: Image processing service that reads/writes to shared NFS mount
  • Why: File system becomes bottleneck, single point of failure
  • Solution: Use object storage (S3) or distributed file systems (HDFS)

3. Real-Time Gaming Servers

  • Example: Multiplayer game server maintaining game state in memory
  • Why: Game state must be consistent across all players, low latency required
  • Solution: Use vertical scaling or specialized game server architecture

4. Applications with Complex Inter-Server Communication

  • Example: Distributed computing framework requiring frequent server-to-server communication
  • Why: Network latency between servers degrades performance
  • Solution: Optimize communication patterns or use vertical scaling

5. Single-Threaded Applications

  • Example: Legacy Python application using GIL (Global Interpreter Lock)
  • Why: Cannot utilize multiple cores effectively
  • Solution: Vertical scaling with more powerful CPU or refactor to multi-process

6. Applications Requiring Strong Consistency Across All Instances

  • Example: Financial transaction processing system
  • Why: Need immediate consistency, distributed systems add complexity
  • Solution: Vertical scaling with strong ACID guarantees or specialized distributed transaction system

7. Applications with Tight Coupling to Hardware

  • Example: GPU-intensive machine learning inference server
  • Why: Requires specific hardware, cannot easily distribute
  • Solution: Vertical scaling with powerful GPUs or specialized ML infrastructure

Which Applications Are NOT Suitable for Vertical Scaling?

Applications that need to scale beyond single machine limits, require high availability, or need to handle massive concurrent users.

Examples of Applications NOT Suitable for Vertical Scaling:

1. High-Traffic Web Applications

  • Example: E-commerce site handling 10 million requests per day
  • Why: Single server cannot handle the load, even with maximum hardware
  • Solution: Horizontal scaling with load balancer and multiple servers

2. Social Media Platforms

  • Example: Twitter/X handling 500 million tweets per day
  • Why: Massive concurrent users, global distribution needed
  • Solution: Horizontal scaling across multiple regions

3. Content Delivery Networks (CDN)

  • Example: Video streaming service serving content globally
  • Why: Need edge locations worldwide, single server insufficient
  • Solution: Horizontal scaling with edge servers in multiple locations

4. Microservices Architecture

  • Example: Application with 50+ microservices
  • Why: Each service needs independent scaling, fault isolation
  • Solution: Horizontal scaling per service, container orchestration

5. High Availability Requirements

  • Example: Banking application requiring 99.99% uptime
  • Why: Single server = single point of failure
  • Solution: Horizontal scaling with redundancy across multiple availability zones

6. Big Data Processing

  • Example: Hadoop cluster processing petabytes of data
  • Why: Data too large for single machine, parallel processing needed
  • Solution: Horizontal scaling with distributed computing framework

7. Real-Time Analytics Platforms

  • Example: Real-time dashboard processing millions of events per second
  • Why: Need distributed processing for high throughput
  • Solution: Horizontal scaling with stream processing (Kafka, Flink)

8. API Gateway Services

  • Example: API gateway handling 100K requests/second
  • Why: Single server cannot handle the load
  • Solution: Horizontal scaling with multiple gateway instances

9. Search Engines

  • Example: Elasticsearch cluster indexing billions of documents
  • Why: Index too large for single machine, need distributed search
  • Solution: Horizontal scaling with sharded indices across nodes

10. Chat/Messaging Applications

  • Example: WhatsApp handling 1 billion users
  • Why: Massive concurrent connections, global distribution
  • Solution: Horizontal scaling with WebSocket servers across regions

Decision Framework: When to Choose What?

Choose Vertical Scaling When:

  • Application has stateful components that can’t be easily distributed
  • Low to moderate traffic (can be handled by single powerful server)
  • Cost-effective for small scale
  • Application requires specific hardware (GPU, high memory)
  • Simpler architecture preferred

Choose Horizontal Scaling When:

  • High traffic or expected traffic growth
  • Need high availability (99.9%+)
  • Global user base requiring low latency
  • Stateless or can be made stateless
  • Cost-effective at scale (commodity hardware)
  • Need fault tolerance
Vertical Scaling vs Horizontal Scaling
Vertical Scaling (Scale Up) Small Server 2 CPU, 4GB RAM 100GB Storage Low Load Add Resources Large Server 8 CPU, 32GB RAM 1TB Storage High Load Horizontal Scaling (Scale Out) Single Server 2 CPU, 4GB RAM Low Load Single Instance Add Servers Server 1 2 CPU, 4GB Same Spec Load Shared Server 2 2 CPU, 4GB Same Spec Load Shared Server 3 2 CPU, 4GB Same Spec Load Shared Load Balancer distributes traffic across servers

Load Balancers

In horizontal scaling, we need to handle client requests and determine which server should serve them. This is where load balancers come into play.

What is a Load Balancer?

A load balancer distributes incoming network traffic across multiple servers to ensure no single server becomes overwhelmed, improving responsiveness and availability.

7 Strategies and Algorithms Used in Load Balancing:

  1. Round Robin – Distributes requests sequentially across servers
  2. Least Connections – Routes to the server with the fewest active connections
  3. Least Response Time – Sends requests to the server with the fastest response time
  4. IP Hash – Uses client IP to determine server assignment (ensures session persistence)
  5. Weighted Algorithms – Assigns requests based on server capacity/weight
  6. Geographical Algorithms – Routes based on geographic location
  7. Consistent Hashing – Distributes load using hash functions, minimizing redistribution when servers are added/removed

Health Checks

Load balancers continuously monitor server health to ensure traffic is only routed to healthy servers. Unhealthy servers are automatically removed from the rotation.

Load Balancer Architecture
Clients Web/Mobile Clients Web/Mobile Load Balancer Round Robin Health Checks Traffic Distribution Server 1 Healthy ✓ Active Server 2 Healthy ✓ Active Server 3 Unhealthy ✗ Removed Health Check Check Failed Health Check Status Server 1: ✓ Healthy Server 2: ✓ Healthy Server 3: ✗ Unhealthy Traffic routed only to healthy servers

How to Implement Load Balancers

Software Load Balancers

  • Nginx – High-performance web server and reverse proxy
  • HAProxy – Reliable, high-performance TCP/HTTP load balancer

Hardware Load Balancers

  • F5 Load Balancer – Enterprise-grade hardware solution
  • Sitrix – Network appliances for load balancing

Cloud-Based Load Balancers

  • AWS Elastic Load Balancer – Managed load balancing service
  • Azure Load Balancer – Microsoft’s cloud load balancing solution
  • GCP Load Balancer – Google Cloud Platform’s load balancing service

SPOF (Single Point of Failure)

A Single Point of Failure is any component that could cause the whole system to fail if it stops working.

Mitigation Strategies:

  • Redundancy – Deploy multiple instances of critical components
    • Load Balancer Redundancy – Deploy multiple load balancers in active-passive or active-active configuration to ensure high availability
    • Database replication – Use master-slave or master-master replication
    • Multiple server instances – Deploy multiple web and application servers
    • Multiple database instances – Use database clusters and replicas
    • Redundant network paths – Multiple network connections and routes
  • Health Checks & Monitoring – Continuously monitor system health and automatically detect failures
    • Implement health check endpoints
    • Monitor server metrics (CPU, memory, disk, network)
    • Set up alerting for critical failures
    • Use monitoring tools like Prometheus, Grafana, or cloud monitoring services
  • Self-Healing Systems – Automatically recover from failures without manual intervention
    • Automatic failover mechanisms
    • Auto-scaling groups that replace failed instances
    • Container orchestration with automatic restarts
    • Circuit breakers to prevent cascade failures
SPOF Mitigation Strategies
Single Point of Failure → Redundant Architecture ❌ Single Point of Failure Single Load Balancer SPOF – If fails, system down Single Server SPOF – If fails, system down Single Database SPOF – If fails, system down Mitigation ✓ Redundant Architecture LB 1 Active Primary LB 2 Standby Backup Load Balancer Redundancy Server 1 Instance 1 Server 2 Instance 2 Server 3 Instance 3 Multiple Server Instances Primary DB Master Read/Write Replica DB Slave Read Only Database Replication Health Checks & Monitoring Continuous monitoring Automatic failover Self-healing systems

Scenario-Based Technical Questions

Real-world scenarios that senior engineers face in interviews and production systems. Each scenario includes problem analysis, solution approach, and architectural considerations.

Scenario 1: Design a URL Shortener (like bit.ly)

Problem Statement:

Design a URL shortening service that converts long URLs into short, shareable links. The system should handle millions of URLs and redirect users efficiently.

Requirements:

  • Generate unique short URLs (e.g., bit.ly/abc123)
  • Handle 100 million URLs per day
  • Store URLs for 5 years
  • 99.9% uptime
  • Redirect latency < 100ms
  • Support custom short URLs

Solution Approach:

  1. URL Encoding: Use base62 encoding (a-z, A-Z, 0-9) to generate 7-character short URLs = 62^7 = 3.5 trillion unique URLs
  2. Hash Function: Use MD5 or SHA-256 hash of long URL, take first 7 characters
  3. Database Design:
    • Short URL (primary key)
    • Long URL
    • Created timestamp
    • Expiration date
    • Click count
  4. Caching: Cache frequently accessed URLs in Redis (LRU eviction)
  5. Load Balancing: Distribute requests across multiple servers
  6. Database Sharding: Shard by short URL hash to distribute load

Key Technologies:

  • Hash Algorithm: MD5/SHA-256 for URL encoding
  • Database: NoSQL (Cassandra/DynamoDB) for high write throughput
  • Cache: Redis for hot URLs (99% cache hit rate target)
  • Load Balancer: Round-robin or consistent hashing
  • CDN: For static assets and frequently accessed redirects

Trade-offs:

Pros: Simple design, high scalability, fast redirects with caching

Cons: Hash collisions need handling, database becomes bottleneck at scale

Alternatives: Use auto-incrementing counter with base62 encoding (requires distributed counter)

URL Shortener Architecture
User Browser Load Balancer Round Robin Health Checks App Server 1 URL Shortener Hash Generator Redirect Handler App Server 2 URL Shortener Hash Generator Redirect Handler Redis Cache Hot URLs (99% hit rate) LRU Eviction NoSQL Database Shard 1 Shard 2 Shard 3 Cassandra/DynamoDB Request Route Check Cache Cache Miss

Scenario 2: Handle Sudden Traffic Spike (10x Traffic)

Problem Statement:

Your e-commerce site experiences 10x traffic during Black Friday sale. Current infrastructure can’t handle the load. How do you prepare and handle this spike?

Requirements:

  • Handle 10x normal traffic (e.g., 1M to 10M requests/minute)
  • Maintain < 200ms response time
  • Zero downtime
  • Cost-effective solution
  • Graceful degradation if needed

Solution Approach:

  1. Auto-scaling: Configure auto-scaling groups to add servers automatically when CPU/memory exceeds 70%
  2. Caching Layer:
    • Cache product catalogs in Redis (TTL: 5 minutes)
    • Cache user sessions
    • CDN for static assets (images, CSS, JS)
  3. Database Optimization:
    • Add read replicas (5-10 replicas for read-heavy traffic)
    • Connection pooling
    • Query optimization and indexing
  4. Load Balancing: Use multiple load balancers with health checks
  5. Queue System: Use message queues for non-critical operations (emails, notifications)
  6. Rate Limiting: Implement rate limiting to prevent abuse
  7. Graceful Degradation: Disable non-essential features (recommendations, reviews) if needed

Key Technologies:

  • Auto-scaling: AWS Auto Scaling, Kubernetes HPA
  • Cache: Redis Cluster, Memcached
  • CDN: CloudFront, Cloudflare
  • Database: Read replicas, connection pooling
  • Load Balancer: ELB, ALB with health checks
  • Message Queue: SQS, RabbitMQ for async processing

Trade-offs:

Pros: Handles traffic spikes, cost-effective (pay for what you use), automatic scaling

Cons: Cold start latency for new instances, potential cost increase during spikes

Best Practice: Pre-warm instances before expected traffic spikes, use reserved instances for baseline

Scenario 3: Database Bottleneck (100K Reads/Second)

Problem Statement:

Your database is becoming a bottleneck with 100,000 reads per second. Response times are increasing, and users are experiencing slow page loads. How do you optimize?

Requirements:

  • Handle 100K reads/second
  • Reduce database load by 80%
  • Maintain data consistency
  • Response time < 50ms

Solution Approach:

  1. Read Replicas:
    • Create 5-10 read replicas
    • Route read queries to replicas
    • Keep writes on primary database
  2. Caching Strategy:
    • Cache frequently accessed data in Redis (cache-aside pattern)
    • Cache query results (TTL: 1-5 minutes)
    • Cache user sessions and preferences
    • Target 90%+ cache hit rate
  3. Database Optimization:
    • Add indexes on frequently queried columns
    • Optimize slow queries
    • Use connection pooling (limit connections per server)
    • Partition large tables
  4. Query Optimization:
    • Use SELECT specific columns (not SELECT *)
    • Implement pagination
    • Use database query result caching
  5. Database Sharding: If single database can’t scale, shard by user_id or region

Key Technologies:

  • Read Replicas: MySQL/PostgreSQL read replicas, AWS RDS read replicas
  • Cache: Redis, Memcached
  • Connection Pooling: PgBouncer, HikariCP
  • Query Optimization: Database indexes, query analyzers
  • Sharding: Database partitioning, consistent hashing

Trade-offs:

Pros: Significant load reduction, improved performance, scalable solution

Cons: Eventual consistency with read replicas, cache invalidation complexity, increased infrastructure cost

Consideration: Monitor replication lag, implement cache warming strategies

Scenario 4: Design a Real-Time Chat System (like WhatsApp)

Problem Statement:

Design a real-time chat system supporting 1 billion users with instant messaging, message delivery guarantees, and online status.

Requirements:

  • 1 billion users
  • Real-time messaging (latency < 100ms)
  • Message delivery guarantee
  • Online/offline status
  • Group chats (up to 256 members)
  • Message history (1 year retention)

Solution Approach:

  1. WebSocket Connection:
    • Persistent WebSocket connections for real-time communication
    • Connection pooling and load balancing
    • Heartbeat mechanism to detect disconnections
  2. Message Flow:
    • User A sends message → App Server
    • Store message in database
    • Push to message queue (Kafka/RabbitMQ)
    • If User B is online: Push via WebSocket
    • If User B is offline: Store for later delivery
  3. Database Design:
    • Messages table: message_id, sender_id, receiver_id, content, timestamp
    • Users table: user_id, status, last_seen
    • Shard by user_id for scalability
  4. Message Queue: Use Kafka for message buffering and delivery
  5. Presence System: Redis to track online users (user_id → server_id mapping)
  6. Caching: Cache recent messages in Redis
  7. Push Notifications: FCM/APNS for offline users

Key Technologies:

  • WebSocket: Socket.io, WebSocket API
  • Message Queue: Kafka, RabbitMQ, SQS
  • Database: NoSQL (Cassandra) for messages, SQL for user data
  • Cache: Redis for presence and recent messages
  • Push Notifications: FCM, APNS
  • Load Balancer: Sticky sessions for WebSocket connections

Trade-offs:

Pros: Real-time communication, scalable architecture, message delivery guarantee

Cons: WebSocket connection management complexity, high memory usage, message ordering challenges

Consideration: Use message IDs for ordering, implement message deduplication

Real-Time Chat System Architecture
User A Online User B Online Load Balancer Sticky Sessions Load Balancer Sticky Sessions App Server 1 WebSocket Handler User A App Server 2 WebSocket Handler User B Message Queue Kafka/RabbitMQ Message Buffering Database Message Storage Sharded by User ID Redis Cache Presence System Online Status Send Message Queue Store

Scenario 5: Prevent Single Point of Failure

Problem Statement:

Your load balancer fails and takes down the entire system. How do you design a system with zero single points of failure?

Requirements:

  • 99.99% uptime (4 nines)
  • Automatic failover
  • Zero data loss
  • Minimal downtime during failures

Solution Approach:

  1. Load Balancer Redundancy:
    • Deploy multiple load balancers (active-passive or active-active)
    • Use DNS failover or floating IP
    • Health checks between load balancers
  2. Application Server Redundancy:
    • Deploy minimum 3+ servers across multiple availability zones
    • Auto-scaling groups with health checks
    • Automatic replacement of failed instances
  3. Database Redundancy:
    • Master-slave replication (automatic failover)
    • Multi-region replication for disaster recovery
    • Regular automated backups
  4. Cache Redundancy:
    • Redis cluster with replication
    • Multiple cache nodes
  5. Network Redundancy:
    • Multiple network paths
    • Multi-region deployment
  6. Monitoring and Alerting:
    • 24/7 monitoring of all components
    • Automated alerts for failures
    • Runbooks for common failure scenarios

Key Technologies:

  • Load Balancer: Multiple ELBs, HAProxy with keepalived
  • Auto-scaling: AWS Auto Scaling, Kubernetes
  • Database: RDS Multi-AZ, PostgreSQL streaming replication
  • Cache: Redis Cluster, ElastiCache with replication
  • Monitoring: CloudWatch, Datadog, Prometheus
  • DNS: Route53 health checks and failover

Trade-offs:

Pros: High availability, automatic recovery, minimal downtime

Cons: Increased infrastructure cost (2-3x), complexity in managing redundancy

Best Practice: Test failover scenarios regularly, implement chaos engineering

Scenario 6: Optimize Slow API (2s to <200ms)

Problem Statement:

Your API takes 2 seconds to respond. Users are complaining. How do you optimize it to respond in under 200ms?

Requirements:

  • Reduce response time from 2s to < 200ms
  • Maintain data accuracy
  • Handle current traffic load
  • Cost-effective solution

Solution Approach:

  1. Identify Bottlenecks:
    • Profile API endpoints (APM tools)
    • Identify slow database queries
    • Check network latency
    • Analyze external API calls
  2. Database Optimization:
    • Add indexes on frequently queried columns
    • Optimize slow queries (use EXPLAIN)
    • Use read replicas for read-heavy endpoints
    • Implement connection pooling
    • Cache query results
  3. Caching Strategy:
    • Cache API responses (TTL: 1-5 minutes)
    • Cache database query results
    • Use Redis for hot data
    • Implement cache-aside pattern
  4. Code Optimization:
    • Remove N+1 queries (use eager loading)
    • Implement pagination
    • Use async processing for non-critical operations
    • Optimize serialization (use efficient formats)
  5. Network Optimization:
    • Use CDN for static responses
    • Compress responses (gzip)
    • Minimize external API calls
    • Use HTTP/2 for multiplexing

Key Technologies:

  • APM Tools: New Relic, Datadog, AppDynamics
  • Cache: Redis, Memcached
  • Database: Query optimization, indexes, read replicas
  • CDN: CloudFront, Cloudflare
  • Compression: gzip, brotli

Trade-offs:

Pros: Significant performance improvement, better user experience

Cons: Cache invalidation complexity, potential stale data, increased infrastructure

Best Practice: Start with database optimization (biggest impact), then add caching

Scenario 7: Reduce Global Latency (2s to <200ms)

Problem Statement:

Users in Asia experience 2-second latency when accessing your US-based application. How do you reduce it to under 200ms?

Requirements:

  • Reduce latency from 2s to < 200ms for Asian users
  • Maintain data consistency
  • Cost-effective solution

Solution Approach:

  1. CDN Deployment:
    • Deploy CDN with edge locations in Asia
    • Cache static assets (images, CSS, JS)
    • Cache API responses where possible
  2. Regional Data Centers:
    • Deploy application servers in Asia region
    • Use geo-routing to route users to nearest region
    • Multi-region deployment
  3. Database Replication:
    • Deploy read replicas in Asia
    • Route read queries to local replicas
    • Writes go to primary (with async replication)
  4. Cache Strategy:
    • Deploy Redis clusters in each region
    • Cache frequently accessed data locally
  5. DNS Optimization:
    • Use Route53 geo-routing
    • Route users to nearest region based on location

Key Technologies:

  • CDN: CloudFront, Cloudflare (with Asian edge locations)
  • Multi-Region: AWS regions (ap-southeast-1, ap-northeast-1)
  • Database: Cross-region read replicas
  • DNS: Route53 geo-routing
  • Load Balancer: Regional load balancers

Trade-offs:

Pros: Significant latency reduction, better user experience globally

Cons: Increased infrastructure cost, data replication complexity, eventual consistency

Consideration: Monitor replication lag, implement conflict resolution for writes

Complete Terraform Guide
Learn more about Rails
Learn more about Mern Stack
Learn more about DevOps

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top