Big Picture Overview

Before a single server can handle all your traffic, you face a fundamental problem: as your application grows, requests pour in from thousands of users simultaneously. A single server—no matter how powerful—has limits. It can only accept a finite number of connections, process a finite number of requests per second, and store a finite amount of data in memory.

A load balancer is the component that solves this problem. It sits between your clients and your backend servers, acting as a traffic controller. When a request arrives, instead of going directly to one server, it goes to the load balancer, which then decides which backend server should handle it. This allows you to scale horizontally—adding more servers to handle more traffic—rather than just buying bigger and bigger machines.

The load balancer doesn’t just distribute traffic randomly. It’s intelligent: it monitors the health of each backend server, ensures no single server is overwhelmed, and can gracefully remove failed servers from the rotation. It’s foundational to building systems that are both scalable and resilient.

In the context of modern system architecture, the load balancer sits at a critical position in your infrastructure. It’s typically one of the first components clients interact with, and its decisions ripple through the entire system.

Core Concepts Explained Simply

What Is a Load Balancer?

A load balancer is a network device or software service that distributes incoming requests across multiple backend servers. Think of it like a receptionist at a busy office: instead of all visitors going to one person, the receptionist directs them to whoever is available.

The Two Main Types of Load Balancers

Load balancers operate at different layers of the network, each with different trade-offs:

Application Load Balancer (Layer 7 / HTTP(S))

An ALB understands HTTP and HTTPS protocols. It can read the content of HTTP requests—including headers, paths, hostnames, and even query parameters—to make routing decisions. For example, it can route requests to /api/* to one set of servers and requests to /images/* to another set optimized for serving static files.

Advantage: Intelligent routing based on application data
Disadvantage: Slightly higher latency because it must examine request content

Network Load Balancer (Layer 4 / TCP/UDP)

An NLB operates at the transport layer and only looks at IP addresses and port numbers. It doesn’t understand HTTP at all. It simply forwards TCP or UDP packets to a selected backend server.

Advantage: Extremely fast with minimal latency; can handle millions of connections per second
Disadvantage: No application-level intelligence; cannot make routing decisions based on HTTP headers or paths

Practical reality: For most web applications (HTTP/HTTPS), use an ALB. For high-throughput, low-latency scenarios (gaming, real-time messaging, video streaming), use an NLB.

Load Balancing Algorithms

The algorithm determines which backend server receives each request. Common algorithms include:

Round Robin

Requests are sent to servers in sequence: Server 1, Server 2, Server 3, Server 1, Server 2, etc. Simple and fair if all servers have equal capacity.

Weighted Round Robin

If servers have different capacities (one powerful machine and two smaller ones), you assign weights. A server with weight 2 gets twice as many requests as a server with weight 1.

Least Connections

The load balancer sends new requests to the server with the fewest active connections. Useful when requests have varying duration. A server handling 5 long-running requests should get fewer new requests than a server handling 20 short requests.

IP Hash (Source IP)

Requests from the same client IP always go to the same backend server. Ensures session affinity—useful if your backend servers maintain in-memory session state. However, it can create imbalance if traffic comes from a few large sources.

Resource-Based (Adaptive)

The load balancer periodically queries each server for its CPU, memory, and I/O metrics, then routes new requests to the healthiest servers. Requires agents running on backend servers but provides the most balanced distribution in heterogeneous environments.

Which to choose? For most cases, use Least Connections with weights. It handles real-world variability well.

Health Checks and Failover

A load balancer must know which servers are alive and healthy. It does this by periodically sending health check probes to each backend server:

An HTTP(S) load balancer might send a GET request to /health
A TCP load balancer might attempt to establish a connection to a specific port

If a server fails consecutive health checks (e.g., 3 failures in a row), the load balancer stops routing new traffic to it. If it recovers (passes 3 consecutive checks), it’s brought back into rotation. This is automatic and requires no human intervention.

How Load Balancers Integrate into a System

┌─────────────────────────────────────────────────────────────┐
│                         CLIENTS                             │
│            (Browsers, Mobile Apps, APIs, etc.)              │
└─────────────────────────────────────────────────────────────┘
                            │
                            │ HTTP/HTTPS Request
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                     LOAD BALANCER                           │
│  ┌───────────────────────────────────────────────────────┐  │
│  │ Selects backend server based on algorithm            │  │
│  │ Health checks backend servers regularly              │  │
│  │ May terminate TLS encryption (see below)             │  │
│  │ Routes request to healthy server                     │  │
│  └───────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            │
                ┌───────────┼───────────┐
                │           │           │
                ▼           ▼           ▼
        ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
        │   Server 1   │ │   Server 2   │ │   Server 3   │
        │  (Active)    │ │  (Active)    │ │  (Standby)   │
        │              │ │              │ │              │
        │ Application  │ │ Application  │ │ Application  │
        │ Process      │ │ Process      │ │ Process      │
        └──────────────┘ └──────────────┘ └──────────────┘
                │           │                    │
                └───────────┼────────────────────┘
                            │
                            ▼
                ┌─────────────────────────┐
                │   Backend Datastore     │
                │   (Database, Cache)     │
                └─────────────────────────┘

Request Flow

Client initiates request: A user’s browser sends an HTTP request to your service’s domain.
DNS resolves to load balancer: The domain name resolves to the load balancer’s IP address.
Load balancer receives request: The load balancer accepts the connection and examines the request (in the case of ALB) or simply checks the TCP connection (in the case of NLB).
Load balancer selects backend: Using its configured algorithm, it chooses a healthy backend server.
Load balancer forwards request: It opens a new connection to the backend server and sends the request.
Backend processes request: The server handles the request, queries the database if needed, and sends a response back to the load balancer.
Load balancer returns response: The load balancer forwards the response back to the client.

Key Integration Points

With clients: The load balancer is the single IP address clients connect to. From the client’s perspective, there’s only one service.

With backend services: The load balancer maintains persistent information about each backend—its IP, port, health status, and current connection count.

With infrastructure: The load balancer itself must be redundant. If the load balancer fails, the entire system becomes unreachable. This is solved by having a pair of load balancers (primary and standby) that share a floating IP address. If the primary fails, the standby automatically takes over.

With external systems: For global deployments, you might use a global load balancer (or DNS-based routing) that sits in front of regional load balancers, directing users to the geographically closest datacenter.

TLS Termination

The Problem: Encrypted Traffic

When a client connects to your service using HTTPS, the data is encrypted using TLS (Transport Layer Security). Encryption and decryption require computational work—for every request, the server must:

Decrypt the incoming request
Process it
Encrypt the outgoing response

This CPU work adds latency and reduces the number of requests a server can handle per second.

The Solution: Termination at the Load Balancer

Modern load balancers can terminate TLS connections, meaning they decrypt the request before sending it to the backend server. Here’s what this looks like:

┌──────────────┐
│   CLIENT     │
│              │
│  HTTPS       │
│ (Encrypted)  │
└──────────┬───┘
           │
           ▼
┌──────────────────────────┐
│  LOAD BALANCER           │
│  ┌──────────────────┐    │
│  │ TLS Certificate  │    │
│  │ Decrypts HTTPS   │    │
│  └──────────────────┘    │
└────────────┬─────────────┘
             │
    ┌────────┴────────┬────────────┐
    │                 │            │
    ▼                 ▼            ▼
┌────────┐        ┌────────┐  ┌────────┐
│Server 1│        │Server 2│  │Server 3│
│        │        │        │  │        │
│  HTTP  │        │  HTTP  │  │  HTTP  │
│(Plain) │        │(Plain) │  │(Plain) │
└────────┘        └────────┘  └────────┘

Benefits

Offloads CPU work: Backend servers focus on business logic, not encryption
Centralized certificate management: Update one certificate on the load balancer instead of updating it on every backend server
Simplified security policies: Configure TLS version and cipher suites in one place
Better performance: Faster request handling with less latency

Trade-off: End-to-End Encryption

If you terminate TLS at the load balancer, traffic between the load balancer and backend servers is unencrypted. This is usually fine because:

This traffic stays within your private network (or VPC)
The load balancer and backends are under your control

However, if your backends are geographically distributed or traffic crosses untrusted networks, you might keep TLS encryption end-to-end (terminate TLS at the backend instead).

Scaling Load Balancers

The Single Load Balancer Problem

A load balancer solves the “too much traffic” problem for backend servers. But the load balancer itself can become a bottleneck and a single point of failure.

What if the load balancer fails? All traffic stops. No client can reach any backend server. This is unacceptable for production systems.

Solution: Redundant Load Balancers

The standard approach is to deploy load balancers in pairs:

┌───────────────────────────────────────────────────────────┐
│                     DNS Record                            │
│           Points to: 203.0.113.50 (Floating IP)           │
└───────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┴───────────────────┐
        │                                       │
        ▼                                       ▼
┌────────────────────┐             ┌────────────────────┐
│   PRIMARY LB       │             │  SECONDARY LB      │
│  203.0.113.50      │             │  203.0.113.51      │
│  (ACTIVE)          │             │  (STANDBY)         │
│                    │             │                    │
│ Health: PASSING    │             │ Health: PASSING    │
└────────────────────┘             └────────────────────┘
        │                                    │
        │                                    │
        └──────────────┬─────────────────────┘
                       │
           ┌───────────┴───────────┐
           │                       │
           ▼                       ▼
        Backend Servers
      (Database, Cache, etc.)

Floating IP Failover Mechanism:
If PRIMARY fails, the floating IP 203.0.113.50 
automatically switches to SECONDARY.

How Failover Works

The load balancers continuously monitor each other’s health using a heartbeat protocol
If the primary load balancer stops responding, the secondary detects this (usually within seconds)
The secondary takes over the floating IP address
New client connections route to the secondary (now active)
Existing connections may be lost, but the service recovers quickly
When the primary recovers, it can failback (or stay in standby depending on configuration)

Scaling Horizontally

If your load balancer is handling so much traffic that it’s becoming a bottleneck, you can deploy multiple load balancers:

Regional scale: Multiple load balancers within a region, using a network-level load balancer or anycast routing to distribute traffic among them
Global scale: Load balancers in different regions, with DNS or a global load balancer directing users to the nearest region

This is less common for the load balancer layer itself (since load balancers are quite efficient) but necessary for truly massive traffic volumes.

Global Availability

The Problem: Single Datacenter Limitations

A single datacenter has inherent risks:

Geographic latency: Users far from the datacenter experience high latency
Single point of failure: A datacenter outage (natural disaster, power failure) takes down your entire system
Compliance: Some regulations require data to be stored in specific regions

Solution: Geographically Distributed Load Balancers

Deploy load balancers and backends in multiple regions worldwide:

┌─────────────────────────────────────────────────────────────┐
│                    GLOBAL DNS / GSLB                        │
│   (Geographic Load Balancer or DNS-based routing)           │
└─────────────────────────────────────────────────────────────┘
          │                    │                    │
          │ Route to closest   │ Route to closest   │ Route to closest
          │ healthy region     │ healthy region     │ healthy region
          │                    │                    │
          ▼                    ▼                    ▼
    ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
    │US East (N.V) │     │EU (London)   │     │APAC (Tokyo)  │
    │              │     │              │     │              │
    │ LB ──┐       │     │ LB ──┐       │     │ LB ──┐       │
    │      │       │     │      │       │     │      │       │
    │   Servers    │     │   Servers    │     │   Servers    │
    │      │       │     │      │       │     │      │       │
    │   Database   │     │   Database   │     │   Database   │
    └──────────────┘     └──────────────┘     └──────────────┘
         (Primary)            (Read-only)         (Read-only)
         or Replica           Replica             Replica

How Global Load Balancing Selects a Region

When a user makes a request, the global load balancer (or DNS system) decides which regional load balancer to route them to based on:

Geographic proximity: Route users to the nearest datacenter (lowest latency)
Health checks: Only route to regions with healthy services
User preferences: Route based on IP geolocation, latency measurement, or explicit user choice
Load distribution: Distribute traffic evenly across regions if capacity allows

Data Replication Considerations

Geographic distribution introduces data consistency challenges:

Primary-replica setup: One region has the primary database (accepts writes), others have replicas (read-only). Writes are slower because they must replicate to other regions.
Multi-master replication: All regions can accept writes, but this introduces complexity (conflict resolution, eventual consistency).
Read-local, write-primary: Users read from their nearest region (fast) but writes go to the primary region (slower but consistent).

Common Use Cases

Use Case 1: High-Traffic E-Commerce Platform

An e-commerce site experiences 10,000 requests per second during peak shopping days. A single server can handle ~100 requests per second, so you need ~100 backend servers.

How load balancing solves this:

Deploy a load balancer in front of 100 servers
Use a Least Connections algorithm to distribute traffic based on actual server load
Configure TLS termination at the load balancer to reduce backend CPU usage
Deploy load balancers in pairs for high availability
Use an ALB to route static content (images, CSS) to optimized servers and dynamic content to application servers

Outcome: The platform scales horizontally. As traffic grows, add more backend servers without changing client code.

Use Case 2: Real-Time Gaming Server

A multiplayer game needs to handle millions of concurrent players with extremely low latency (players expect <100ms response times).

How load balancing solves this:

Use Network Load Balancers (NLB) for ultra-low latency (no HTTP parsing overhead)
Use IP Hash or Source IP algorithm so players always connect to the same game server (maintains game state consistency)
Deploy NLBs globally so players connect to their nearest regional server
Avoid TLS termination at the load balancer; instead, use TLS directly on game servers (or use alternative protocols like QUIC)

Outcome: Players experience consistent, low-latency gameplay regardless of player count.

Use Case 3: Mobile Banking App (High Availability + Compliance)

A banking app must maintain 99.99% uptime and comply with regulations requiring data residency in specific countries.

How load balancing solves this:

Deploy redundant load balancers in each country/region
Use TLS termination at the load balancer with strict security policies (only modern TLS versions, strong ciphers)
Configure health checks to immediately detect server failures and reroute traffic
Use a primary-replica database setup: all writes go to the primary region, reads can come from local replicas
Periodically test failover to ensure it works

Outcome: The system is highly available and resilient to individual server failures. Data stays in compliant jurisdictions.

Trade-offs and Limitations

What Load Balancers Are Good At

✅ Distributing traffic: Evenly spreads load across servers, preventing overload

✅ Detecting failures: Automatically removes unhealthy servers and reroutes traffic

✅ Horizontal scaling: Enables systems to grow by adding more servers, not bigger servers

✅ Geographic distribution: Can direct users to nearby datacenters for low latency

✅ Offloading TLS: Removes encryption overhead from backend servers

What Load Balancers Are Bad At

❌ Stateful requests without planning: If your backend servers store session state in memory, sessions are lost when a server fails or a user is routed to a different server. Solution: Use sticky sessions (route same client to same server) or external session storage (Redis, Memcached).

❌ Maintaining connection state during updates: Rolling out a new version of your backend code requires careful handling. Long-lived connections (WebSockets, streaming) may be interrupted. Solution: Use connection draining (gracefully close existing connections before removing the server).

❌ Solving database bottlenecks: A load balancer distributes requests to databases too, but the database itself becomes a bottleneck. Load balancers alone can’t fix this. Solution: Use database replication, caching, or sharding.

❌ Preventing DDoS attacks at the application level: Load balancers can distribute traffic, but they can’t distinguish between legitimate requests and attack traffic. Solution: Use DDoS mitigation services in front of the load balancer.

Common Mistakes

Mistake 1: Single Load Balancer in Production

A single load balancer becomes a single point of failure. Always deploy in pairs with automatic failover.

Mistake 2: Not Configuring Health Checks Properly

If health checks are too lenient, failed servers stay in rotation. If they’re too strict, healthy servers are removed. Tune based on your application’s actual response times.

Mistake 3: Assuming Load Balancer Solves Scalability

Load balancers distribute requests, but if your backend servers or database can’t handle the total load, adding a load balancer doesn’t help. Scale each layer independently.

Mistake 4: Forgetting About Load Balancer Limits

Modern load balancers can handle millions of connections, but they have limits. As you grow, monitor load balancer metrics (connections per second, bandwidth) and upgrade or scale horizontally if needed.

Mistake 5: Overcomplicating Routing Logic

Simple algorithms (Round Robin, Least Connections) work well for most cases. Complex adaptive algorithms introduce operational complexity. Keep it simple unless you have a specific need.

How This Fits into an Architect’s Mental Model

As a system architect, think about load balancers as a scaling abstraction layer. They solve a specific problem: distributing load across multiple servers. But they don’t solve all scaling problems.

Three Key Mental Models

1. The Scaling Hierarchy

When your system gets overloaded, you scale different layers:

Overloaded Request Handling?
    ↓
Use Load Balancer → Distribute across more servers
    ↓
Overloaded Database?
    ↓
Use Database Replication → Read replicas in different regions
    ↓
Overloaded Cache?
    ↓
Distribute Cache Layer → Multiple cache nodes with sharding

Load balancers address the first level, but you’ll encounter other bottlenecks as you scale.

2. Availability Through Redundancy

Every critical component needs redundancy:

Critical Path:
Client → Load Balancer → Server → Database

Redundant Path:
Client → [LB Primary + LB Secondary] → [Servers 1,2,3...] → [DB Primary + Replicas]

The load balancer is part of the critical path. Its failure must be mitigated with a backup.

3. Trade-off Between Latency, Consistency, and Complexity

When designing a global system:

Low latency: Route users to nearby datacenters (geographic distribution)
Strong consistency: All writes go to one primary region (slower for distant users)
Simplicity: Single region with replicated load balancers (easier to operate)

You can’t have all three. Choose based on your requirements.

Design Decisions

When to introduce a load balancer:

When a single server can’t handle your peak traffic
When you need automatic failover and high availability
When you’re deploying across multiple zones or regions

When to avoid complexity:

Don’t use geographic load balancing if you don’t have data residency requirements or latency constraints
Don’t over-engineer health checks; simple HTTP checks usually suffice
Don’t implement complex sticky-session logic if external session storage (Redis) is available

Monitoring and Observability:

Once you introduce a load balancer, monitor:

Request distribution: Are all servers receiving roughly equal traffic?
Health check failures: Are servers being incorrectly marked unhealthy?
Load balancer performance: Is the load balancer itself becoming a bottleneck?
Regional failover: Test and verify geographic failover periodically

Key Takeaway

A load balancer is one of the first scalability tools you’ll use, but it’s not a silver bullet. It solves the problem of distributing traffic across servers, but each layer of your system (backend servers, database, cache) needs its own scaling strategy. Think of the load balancer as the entry point to that scaling journey, not the endpoint.

Summary

Load balancers distribute incoming traffic across multiple backend servers, enabling horizontal scaling and improving availability. They operate at different network layers (Application Layer 7 for HTTP(S) or Network Layer 4 for TCP/UDP), use various algorithms to select backend servers, and automatically detect and isolate failed servers.

Key architectural considerations: deploy load balancers in redundant pairs to avoid single points of failure, use TLS termination to offload encryption work, configure appropriate health checks, and remember that load balancers are just one layer in a scalable system. As your system grows, you’ll need to scale your database, cache, and other components independently.

The most common mistake is treating a load balancer as a complete solution to scalability—it’s not. It’s a necessary foundation for building systems that can grow horizontally, but it works best when combined with thoughtful design of backend services, database architecture, and global deployment strategies.

System Design Interview Questions

Overview

This comprehensive guide contains 8 in-depth system design interview questions on load balancers, ranging from foundational concepts to advanced architectural challenges. Each question includes a detailed ideal answer that demonstrates deep technical understanding, practical architectural thinking, and awareness of trade-offs—the hallmarks of senior-level system design interviews.

Question 1: Design a Load Balancer from Scratch

Question: Design a load balancer that can handle millions of requests per second from clients to a pool of backend servers. Explain the key components, algorithms you’d use, and trade-offs you’d consider.

Ideal Answer:

A load balancer is a network device or software component that distributes incoming client requests across multiple backend servers to optimize resource utilization, reduce latency, and provide fault tolerance.

Architecture Overview:

			
Clients (Millions)
        │
        ▼
┌──────────────────────┐
│   Load Balancer      │
│ ┌──────────────────┐ │
│ │ Listener         │ │
│ │ (Port 443 HTTPS) │ │
│ └──────────────────┘ │
│         │            │
│ ┌──────────────────┐ │
│ │ TLS Termination  │ │
│ │ Decrypts HTTPS   │ │
│ └──────────────────┘ │
│         │            │
│ ┌──────────────────┐ │
│ │ Routing Logic    │ │
│ │ (Algorithm)      │ │
│ └──────────────────┘ │
│         │            │
│ ┌──────────────────┐ │
│ │ Health Checks    │ │
│ │ Monitor backends │ │
│ └──────────────────┘ │
└──────────────────────┘
        │
    ┌───┼───┬─────────┐
    ▼   ▼   ▼         ▼
  Server1 Server2... ServerN
  (Healthy)

		

Key Components:

1. Listener – Accepts incoming connections on a specific port (e.g., 443 for HTTPS), maintains connection state and tracks active connections per backend server

2. TLS Termination Module – Decrypts HTTPS traffic at the load balancer, encrypts responses, offloads CPU work from backends, and centralizes certificate management

3. Routing Algorithm – Determines which backend server handles each request:

Algorithm	Mechanism	Best For	Trade-off
Round Robin	Sequential selection	Equal capacity servers	No load awareness
Least Connections	Routes to server with fewest active connections	Long-lived requests	Requires connection tracking
Weighted Round Robin	Accounts for server capacity differences	Heterogeneous servers	Manual weight configuration
IP Hash	Hashes client IP to consistent server	Session affinity	Can create imbalances
Least Response Time	Combines latency + connection count	Variable request durations	Higher LB overhead

Recommendation: For millions of QPS, use Least Connections with weights—it balances real-world variability and scales well.

4. Health Check Module – Periodically probes servers to verify health:

Failure threshold: typically 3-5 consecutive failures before removal
Recovery threshold: 3-5 successes before re-adding
Check interval: 5-10 seconds (balance detection speed vs. overhead)

Connection Pooling and Reuse:

			
Client Request 1 → LB → Backend Server 1 (Connection Pool)
Client Request 2 → LB → Backend Server 2 (Connection Pool)
Client Request 3 → LB → Backend Server 1 (Reuses connection)
Benefits: Reduces TCP handshake overhead, reduces backend CPU usage,
          improves latency for high-QPS scenarios

		

Trade-offs and Design Decisions:

Decision	Why	Trade-off
Terminate TLS at LB	Offload CPU, centralize certificates	Unencrypted traffic LB↔Backend (acceptable in private network)
Use Least Connections	Adapts to real load, handles variable times	Requires connection state tracking
Health checks every 5s	Balance detection latency vs. overhead	Takes ~15s to detect failure (3 failures × 5s)
Connection pooling	Reduces TCP handshake overhead	Complexity in connection lifecycle management

Handling Load Balancer Bottlenecks:

For millions of QPS, a single load balancer will eventually saturate. Solutions include:

Redundant LBs: Deploy 2-3 load balancers in active-active or active-passive mode
Layer load balancers: Use L4 LB (fast) in front of multiple L7 LBs (intelligent)
Distributed LB: Use hash-based client assignment to route clients to different LBs via DNS or Anycast

Question 2: Application Load Balancer vs Network Load Balancer

Question: Your company is building three services: (1) a web e-commerce platform, (2) a low-latency gaming backend, and (3) a financial transaction system. Which load balancer type would you recommend for each, and why?

Ideal Answer:

Understanding the Layers:

Network Load Balancer (NLB, L4): Operates at the transport layer (TCP/UDP). Routes based on IP address and port only. Extremely fast, can handle millions of connections.
Application Load Balancer (ALB, L7): Operates at the application layer (HTTP/HTTPS). Can read HTTP headers, paths, hostnames. Slower but more intelligent.

1. E-Commerce Platform → Application Load Balancer

Why:

Need intelligent routing: /api/* → API servers, /images/* → image servers
Can route based on HTTP headers and hostnames
TLS termination simplifies backend architecture
Latency less critical (users tolerate 50-100ms for page loads)

Configuration Example:

			
Routing Rules:
  Host: www.ecommerce.com/api/* → API Server Pool
  Host: www.ecommerce.com/images/* → Image Server Pool
  Header: X-Admin = true → Admin Server Pool

Trade-off: Slightly higher latency (ALB must parse HTTP), but acceptable for e-commerce.

2. Low-Latency Gaming Backend → Network Load Balancer

Why:

Games require sub-100ms latency (ideally <50ms). Every millisecond matters.
UDP-based protocols (common in games) require L4 load balancing
NLB processes traffic with minimal overhead → lower latency
Can handle millions of concurrent connections

Implementation Detail:

			
NLB Configuration:
  Protocol: UDP
  Algorithm: IP Hash (ensure same player always goes to same server)
Why IP Hash for games?
  - Game state stored on server
  - If player connects to different server, character position is lost
  - Sticky routing ensures consistency

		

Trade-off: No application-level intelligence; if you need complex routing, use a secondary routing layer.

3. Financial Transaction System → Network Load Balancer with Sticky Sessions

Why:

Financial transactions often require strict ordering and consistency
NLB provides low latency for time-sensitive operations
Sticky sessions: Route all transactions from a client to the same server to maintain ordering invariants
Extremely high throughput capable

Configuration:

			
NLB Configuration:
  Protocol: TCP (TLS pass-through, not termination)
  Sticky Sessions: IP-based routing
Why not terminate TLS at NLB?
  - Unencrypted traffic LB↔Backend risky for financial data
  - Instead: Pass-through TLS, backends handle encryption

		

Comparison Table:

Criteria	ALB	NLB
OSI Layer	L7 (Application)	L4 (Transport)
Protocols	HTTP, HTTPS	TCP, UDP
Latency	10-100ms overhead	<10ms overhead
Throughput	~100,000 RPS per instance	>1M RPS per instance
Intelligent Routing	Yes (headers, paths, hostnames)	No (only IP/port)
Best For	Web apps, microservices, APIs	Gaming, real-time data, high-throughput

Question 3: Handle Session State with Load Balancers

Question: Your application stores user session data in memory on each backend server. When you add a load balancer and scale to 3 servers, requests from the same user can go to different servers. How would you solve session loss?

Ideal Answer:

The Problem:

			
Request 1: User logs in → Server 1
  Server 1 stores: {user_id: 123, cart: [item1], logged_in: true}
Request 2: User adds item → Server 2 (different server!)
  Server 2 has no session info
  Result: "Please log in" error

		

This is the classic session state problem in load-balanced systems.

Solution 1: Sticky Sessions (Session Affinity)

The load balancer remembers which server each client connects to and always routes them there.

Implementation A: IP-Based Sticky Sessions

			
Hash(client_IP) → Server 1
All requests from 192.168.1.100 → Server 1
Pros: Simple, no additional data structures
Cons: Mobile users change IP (sessions lost), unbalanced load

Implementation B: Cookie-Based Sticky Sessions

			
Response from Server 1:
  Set-Cookie: SERVERID=server-1; Path=/
Subsequent requests:
  Cookie: SERVERID=server-1
  LB reads cookie, routes to that server
Pros: Works even if client IP changes
Cons: Still loses session if server fails

		

Pros/Cons: Simpler but causes uneven load distribution and failover complexity.

Solution 2: External Session Storage (Recommended for Scale)

Move session data out of backend servers into a separate, shared store:

			
┌──────────────────────────────────────┐
│           Load Balancer              │
└──────────────┬───────────────────────┘
        │
    ┌───┼───┬──────────┐
    ▼   ▼   ▼          ▼
┌────────────────┐ ┌────────────────┐
│  Server 1      │ │  Server 2      │
│  (Stateless)   │ │  (Stateless)   │
└────────────────┘ └────────────────┘
        │               │
        └───────┬───────┘
                ▼
      ┌──────────────────┐
      │  Redis Cache     │
      │  Session Store   │
      │  {user_123:      │
      │   {cart: [..],   │
      │    logged_in:T}} │
      └──────────────────┘

		

Implementation Example:

			
@app.route('/add-to-cart', methods=['POST'])
def add_to_cart():
    item = request.json['item']
    user_id = request.headers.get('User-ID')
    # Read from Redis (any server can access)
    session = redis.get(f"session:{user_id}")
    session['cart'].append(item)
    # Write back to Redis (persisted across servers)
    redis.set(f"session:{user_id}", session, ex=3600)
    return {"status": "added", "cart": session['cart']}

		

Pros: Perfect scalability, fault tolerance with replication, handles migrations easily
Cons: External dependency, slightly higher latency, cache becomes potential single point of failure.

Solution 3: Stateless Applications (Best Practice)

Store session data in the client (signed JWT token) or use event sourcing:

			
Request 1: User logs in → Server 1
  Server 1 creates JWT: eyJhbGc...payload...signature
  Returns: {token: "eyJhbGc...", user_id: 123}
Request 2: User adds item → Server 2
  Client sends JWT: Authorization: Bearer eyJhbGc...
  Server 2 verifies JWT signature (no state lookup)
  Reads user info from JWT
  Result: Works seamlessly!

		

Pros: Perfect scalability, no external dependencies, great for microservices
Cons: Token size, can’t revoke instantly, don’t trust client-side data.

Comparison Table:

Approach	Complexity	Scalability	Fault Tolerance	Best For
Sticky Sessions	Low	Medium (uneven load)	Poor (server failure)	Small systems
External Cache	Medium	Excellent	Good (with replication)	Web apps, shopping carts
Stateless (JWT)	Low	Excellent	Excellent	Mobile apps, microservices

Recommendation: For small apps, use sticky sessions; for growing apps, use Redis; for distributed systems, use stateless JWT.

Question 4: Load Balancer as Single Point of Failure

Question: Your system has a load balancer in front of 10 backend servers. What happens if the load balancer fails? How would you design for high availability?

Ideal Answer:

The Problem:

A single load balancer is a single point of failure. If it crashes:

			
Client → [DEAD LB] → No backend reachable
Result: Entire service down, even though backends are healthy

This is unacceptable for production.

Solution: Redundant Load Balancers with Failover

			
┌────────────────────────────────────────────────────────┐
│         Floating IP: 203.0.113.50                      │
│  (Shared by both load balancers via failover)          │
└────────────────────────────────────────────────────────┘
                    │
        ┌───────────┴───────────┐
        │                       │
        ▼                       ▼
┌──────────────────┐    ┌──────────────────┐
│ LB-1 (Primary)   │    │ LB-2 (Standby)   │
│ 203.0.113.50     │    │ 203.0.113.51     │
│ Status: ACTIVE   │    │ Status: PASSIVE  │
└──────────────────┘    └──────────────────┘
        │ Heartbeat (every 1 second)
        └────────→

		

How Failover Works:

			
Timeline:
t=0s:     LB-1 crashes
t=1s:     LB-2 expects heartbeat, doesn't receive
t=2s:     LB-2 expects heartbeat, still nothing
t=3s:     LB-2 fails heartbeat 3 times in a row
          → LB-2 claims the floating IP 203.0.113.50
t=3-5s:   Existing connections to LB-1 get reset
          Clients retry, now go to LB-2
t=5-10s:  DNS propagation (if using DNS failover)
          New clients query DNS → get LB-2's IP
Result:   Service back online within 3-10 seconds  

		

Implementation Approaches:

Approach 1: Virtual IP with Keepalived (On-Premise)

Uses VRRP protocol for automatic failover:

			
LB-1: 192.168.1.10
LB-2: 192.168.1.11
Floating VIP: 192.168.1.50 (shared)
Keepalived continuously exchanges heartbeats
On failure, automatically updates ARP table

		

Pros: Open-source, widely supported
Cons: Self-managed, requires operational expertise.

Approach 2: Cloud-Native (AWS, GCP, Azure)

Modern cloud providers handle this automatically:

			
AWS ALB:
  - Deployed across 2+ availability zones
  - Each AZ has separate LB instance
  - If one AZ down, traffic reroutes to another
  - No manual failover needed

		

Pros: Transparent, highly available, managed by provider
Cons: Vendor lock-in, less control.

Approach 3: Global Load Balancer with DNS

For multi-region deployment:

			
┌──────────────────────────────────┐
│  Global Load Balancer / DNS      │
│  (Route 53, Cloudflare, etc)     │
└──────────────────────────────────┘
        │
    ┌───┼───┬──────────┐
    ▼   ▼   ▼          ▼
   LB-  LB-  LB-      LB-
   US   EU   APAC     ...
   East London Tokyo
Each regional LB is itself redundant  

		

Key Metrics:

Parameter	Value	Why
Heartbeat Interval	1 second	Balance detection speed vs. network overhead
Failure Threshold	3 missed heartbeats	Tolerate brief glitches, detect real failures
Total Failover Time	3-10 seconds	Depends on VIP vs DNS
Primary Priority	200	Primary always becomes active first
Standby Priority	150	Only primary if primary dies

Handling Stateful Connections During Failover:

			
Challenge: If LB-1 fails mid-request, what happens?
Options:
1. Accept the loss (simplest)
   - Client retries
   - OK for idempotent operations (GET, POST with idempotency key)
2. Connection Draining (graceful shutdown)
   - Mark LB as "draining"
   - Stop accepting new connections
   - Wait for existing connections to finish
   - Then remove from service
   - Pro: Zero connection loss, Con: Takes time
3. Connection Replication (complex)
   - Primary and Standby replicate connection state
   - Transparent failover
   - Con: Very complex, high overhead

		

For most systems, Accept the loss + retry is acceptable; most client libraries already implement exponential backoff.

Question 5: Geographically Distributed Load Balancing

Question: Your company operates in 3 regions: US East, EU West, and APAC. You want to serve users with low latency and provide regional failover. Design a load balancing strategy.

Ideal Answer:

Architecture Overview:

			
┌──────────────────────────────────────────────────────────┐
│           Global Load Balancer (GLB)                     │
│     - Route 53 (AWS), Cloudflare, or custom              │
│     - Periodically health checks each region             │
│     - Routes users to nearest healthy region             │
└──────────────────────────────────────────────────────────┘
        │                   │                    │
        ▼                   ▼                    ▼
    ┌─────────────┐    ┌─────────────┐   ┌─────────────┐
    │  US-EAST    │    │  EU-WEST    │   │  APAC       │
    │  Region     │    │  Region     │   │  Region     │
    │             │    │             │   │             │
    │ ┌─────────┐ │    │ ┌─────────┐ │   │ ┌─────────┐ │
    │ │LB Pair  │ │    │ │LB Pair  │ │   │ │LB Pair  │ │
    │ └─────────┘ │    │ └─────────┘ │   │ └─────────┘ │
    │      │      │    │      │      │   │      │      │
    │   [Servers] │    │   [Servers] │   │   [Servers] │
    │   [DB Pri]  │    │   [DB Rep]  │   │   [DB Rep]  │
    └─────────────┘    └─────────────┘   └─────────────┘

		

Three Layers of Load Balancing:

Layer 1: Global Load Balancer (Geographic Routing)

Route users to the nearest region based on DNS geolocation:

			
User in San Francisco → GLB checks: US East health = HEALTHY
                     → GLB returns: IP of LB-1 (US East)
User in London → GLB checks: EU West health = HEALTHY
              → GLB returns: IP of LB-2 (EU West)

Implementation (AWS Route 53):

			
1. Create Health Checks for each region
   - Health Check on US-EAST: GET /health → 200 OK
   - Health Check on EU-WEST: GET /health → 200 OK
2. Create Geolocation-based Routing
   Location: North America → US-EAST LB IP
   Location: Europe → EU-WEST LB IP
   Location: Asia → APAC LB IP
3. Health check interval: 30 seconds
   If region fails 3 checks: Remove from routing

		

Alternative: Latency-based routing measures actual latency to each region instead of geography.

Layer 2: Regional Load Balancer (Within-Region Failover)

Each region has a redundant pair (identical to Question 4).

Layer 3: Backend Load Balancing (Server Selection)

Within each region, use algorithms (Round Robin, Least Connections, etc.).

Data Consistency Across Regions:

Challenge: If database is in US-EAST, how do EU and APAC regions access it?

Option 1: Primary-Replica Setup (Recommended)

			
Write Path:
  User in US → LB-US → Server → DB-Primary (US)
Replication:
  DB-Primary (US) → Replicates to → DB-Replica (EU)
  DB-Primary (US) → Replicates to → DB-Replica (APAC)
Read Path:
  User in US → Reads from DB-Primary (low latency, latest)
  User in EU → Reads from DB-Replica (higher latency, slightly stale)

		

Pros: Strong consistency, scalable reads
Cons: Replication lag, if US goes down, writes blocked.

Handling Region Failure:

			
Scenario: US-EAST region becomes unavailable
Timeline:
  t=0s:  US-EAST datacenter loses power
  t=10s: GLB health check times out (failure #1)
  t=40s: GLB fails health check 3x, removes US-EAST
  US users:
    - Try to reach US LB: Connection timeout
    - Browser/client retries
    - New DNS query gets EU-WEST or APAC IP
    - Users rerouted (100-200ms higher latency)
    - Service continues
Database:
  - US-EAST DB primary is down
  - Replicas in EU/APAC have latest writes
  - Manually promote EU replica to primary

		

Optimizations:

1. DNS TTL – Set short TTL (30-60 seconds) for fast failover

2. Database Replication Lag – Monitor and minimize; route writes/reads to same region

3. Anycast Routing – Same IP from multiple regions; packets route to closest (advanced)

Question 6: Load Balancer Performance Optimization

Question: Your load balancer handles 1 million requests per second. CPU is at 80% and approaching saturation. What strategies would you use to optimize performance without upgrading hardware?

Ideal Answer:

Diagnosis: Where is CPU Usage?

			
Typical LB CPU breakdown:
  - TLS handshake/decryption: 40-50%
  - Connection state tracking: 20-30%
  - Routing algorithm: 10-15%
  - Health checks: 5-10%
  - Logging/monitoring: 10-15%

		

Optimization Strategies (In Priority Order):

1. Optimize TLS Handling (20-30% reduction)

Strategy A: Enable TLS 1.3 + Session Resumption

			
TLS Session Resumption (Session Tickets):
Connection 1:
  CLIENT: ClientHello
  SERVER: ServerHello, [Session Ticket]
  (Full handshake, CPU intensive)
Connection 2 (minutes later):
  CLIENT: ClientHello + Session Ticket
  SERVER: Resumes session (no full handshake)

		

Benefits: TLS 1.3 requires only 1 round trip (vs 2 for 1.2), only supports fast ciphers.

2. Enable HTTP/2 Multiplexing (15-20% reduction)

			
HTTP/1.1:
  Client 1 → Dedicated TCP connection → Backend
  Client 2 → Dedicated TCP connection → Backend
  (One TCP per client, high connection count)
HTTP/2 (Multiplexing):
  Client 1 & 2 & 3 → Single TCP connection → Multiplexed to backends
  (Fewer total connections)
Result:
  - Fewer TCP connections to track
  - Fewer TLS handshakes
  - 30-50% CPU reduction

		

3. Reduce Health Check Frequency (10% reduction)

			
Current (inefficient):
  Health check interval: 5 seconds
  Failure threshold: 3 failures
Optimized:
  Health check interval: 10 seconds
  Failure threshold: 2 failures
  Use TCP-only checks instead of HTTP (faster)
Example:
  TCP check: Connect to port (50 microseconds)
  HTTP check: GET /health, 200 OK (5-10 milliseconds)
Trade-off: 90% CPU reduction, less visibility  

		

4. Switch Routing Algorithm (10-15% reduction)

			
Algorithm CPU Cost:
  - Round Robin: O(1) - just increment counter
  - Least Connections: O(n) - check all servers' connection count
  - IP Hash: O(1) - just hash IP
Current: Least Connections (O(n))
New: Weighted Round Robin (O(1))
With 1000 servers: 1000x faster per request lookup  
Hybrid: "Power of d Choices"
  - Randomly sample d servers (e.g., d=2)
  - Pick the one with least connections
  - O(d) lookup, d << n

		

5. Offload Logging/Monitoring (10-15% reduction)

			
Current: Log every request
  timestamp, source IP, backend, response time
Options:
A. Sampling: Log only 1% of requests
   - 10x CPU reduction
   - Still get statistical insights
B. Offload to backend: Backends handle detailed logging
   - LB logs minimal (only errors)
   - 15% CPU reduction
C. Async logging: Buffer locally, send asynchronously
   - 5% CPU reduction

		

6. Vertical Scaling (2-3x improvement)

If optimizations still hit 80% CPU:

			
Upgrade to newer CPUs with:
  - AES-NI: Hardware-accelerated encryption (40% faster TLS)
  - AVX-512: SIMD operations (faster hashing)
  - More cores: Distribute load across threads
Result: 2-3x CPU capacity
Cost: $1,000-10,000 per instance

		

Optimization Priority Checklist:

✅ Enable TLS 1.3 + session resumption (20-30%, easy)
✅ Enable HTTP/2 multiplexing (15-20%, easy)
✅ Reduce health check frequency (10%, easy)
✅ Switch to weighted round robin (10-15%, easy)
✅ Offload logging/sampling (10-15%, moderate)
✅ Upgrade CPU if needed (2-3x, expensive)

Typical Result: 50-80% CPU reduction with minimal effort.

Question 7: Load Balancer and Database Connections

Question: Your app uses a database connection pool (max 100 connections). You have 10 backend servers, each with its own pool. How many connections should each pool have? What issues might arise?

Ideal Answer:

Understanding Connection Pools:

			
Without connection pool:
  Open connection → Handshake → Execute query → Close connection
  (Time: 100-500ms including handshake)
With connection pool:
  Get connection from pool → Execute query → Return to pool
  (Time: 1-10ms, connection already open)

		

The Problem: Oversubscription

			
Naive approach:
  10 backend servers × 100 connections per server = 1,000 connections
Database max connections: 100 (default)
Result: PROBLEM! 900 connection attempts fail
Error: "too many connections"

		

Correct Design:

Step 1: Determine Database Connection Limit

			
Check database configuration:
  PostgreSQL: SHOW max_connections;
  MySQL: SHOW VARIABLES LIKE 'max_connections';
Typical default: 100
Typical enterprise: 500-5000

		

Step 2: Account for Connection Overhead

			
Database reserved connections:
  - Replication connections: 2-5
  - Admin connection: 1
  - Monitoring agents: 1-3
Available for application: Total - Reserved
Example (max_connections=100):
  Total: 100
  Reserved: 10
  Available for app: 90

		

Step 3: Divide Among Backend Servers

			
Formula:
  Pool size per server = (Available connections) / (Num servers)
Example:
  90 available / 10 servers = 9 connections per server
Configuration per backend:
  database:
    pool:
      maxConnections: 9
      minConnections: 3
      maxIdleTime: 300  # Close idle after 5 minutes

		

Accounting for Peak Load:

			
If not all servers healthy (e.g., 80% uptime):
Safer formula:
  Pool size = Available connections / (Expected minimum healthy servers)
Example:
  90 connections / 8 servers (80% of 10) = 11 connections per server

		

Common Issues:

Issue 1: Connection Leaks

			
Problem:
  Application opens connection but forgets to close it
Timeline:
  t=0:   App opens connection 1
  t=5s:  Connection never closed (bug)
  t=10s: App opens connection 2
  t=100s: All 9 pool connections exhausted
  t=101s: App request fails: "No available connections"
Solution:
  - Use connection timeouts
  - Monitor pool utilization
  - Code review

		

Issue 2: Long-Running Queries

			
Problem:
  Query takes 30 seconds
  Connection held for 30 seconds
Scenario (pool size 9):
  Request 1: Uses connection 1 (t=0-30s)
  Request 2: Uses connection 2 (t=1-31s)
  ...
  Request 9: Uses connection 9 (t=8-38s)
  Request 10 (t=9s): Wants connection, all 9 in use
  → Request queues and waits
Solution:
  - Optimize slow queries
  - Set query timeout (e.g., 5 seconds)
  - Increase pool size (but increases DB load)

		

Monitoring and Alerting:

			
Key metrics:
1. Database Active Connections:
   Alert if: > 80% of max_connections
2. Per-Server Pool Utilization:
   Alert if: > 90% for 1 minute
3. Connection Wait Time:
   Alert if: > 1 second
4. Connection Lifecycle:
   Alert if: Opened but never closed (leak detection)

		

Database-Side Tuning (If Needed):

			
Option A: Increase Database Limits (if possible)
  max_connections = 500
  shared_buffers = 256MB
  Cost: More RAM needed
  Benefit: Each server can have larger pool
Option B: Use Database Connection Pooler (PgBouncer, ProxySQL)
  Backends → PgBouncer → Database
  Multiplexes: 1000 client connections → 50 actual DB connections
  Benefit: Support more concurrent users without overloading DB
  Trade-off: Additional component, slight latency

		

Summary: Connection Pool Sizing

Component	Count	Reasoning
DB Max Connections	100	Default, check your DB
Reserved	10	Replication, admin, monitoring
Available	90	100 – 10
Backend Servers	10	Your LB routes to 10
Pool per Server	9	90 / 10
Total at Peak	90	9 × 10 (all healthy)

Question 8: Load Balancer and Cache Coherence

Question: Your application uses distributed caching (Redis cluster). With a load balancer, the same user can hit different backend servers. How do you ensure cache coherence? What issues arise?

Ideal Answer:

The Problem: Race Conditions

			
Scenario: Concurrent requests
User Request 1 (t=0): Update profile name → Server 1
  Server 1: GET from Redis {name: "John"}
User Request 2 (t=1): Update profile email → Server 2
  Server 2: GET from Redis {name: "John"}
Server 1 (t=2): Modifies to {name: "Jane", email: "john@..."}
  Server 1: SET to Redis (overwrites previous state)
Server 2 (t=3): Modifies to {name: "John", email: "jane@..."}
  Server 2: SET to Redis (overwrites Server 1's changes!)
Result: Jane's email update is lost! (Lost update problem)

		

This is the fundamental cache coherence challenge.

Solution 1: Sticky Sessions

If same user always goes to same server:

All user’s requests go to Server 1
No race condition (only one server touches user data)

Pros: Simpler code
Cons: Uneven load, failover complexity.

Solution 2: Versioning with CAS (Compare-and-Swap)

Use atomic operations provided by Redis:

			
Pseudocode:
Server 1:
GET user:123:version (current = 5)
GET user:123:profile:v5
Update value locally
INCR user:123:version (new = 6)
SET user:123:profile:v6
Server 2 (concurrent):
GET user:123:version (gets 5)
GET user:123:profile:v5
Update value locally
INCR user:123:version (also tries = 6)
Detects conflict: version changed
Retries: re-read v6, increment to v7, update
Result: No lost updates, but retry logic needed

		

Redis Implementation (WATCH/MULTI/EXEC):

			
WATCH user:123:version
MULTI
  GET user:123:profile
  # ... update logic ...
  SET user:123:profile {updated}
  INCR user:123:version
EXEC
If another client changed user:123:version:
  WATCH detects the change
  EXEC fails (returns nil)
  Client retries from WATCH

		

Pros: Handles concurrent updates without locking
Cons: Retry logic needed, possible retry storms.

Solution 3: Distributed Locks

Use locks to ensure only one server updates at a time:

			
User Request: Update profile
Server 1:
  1. ACQUIRE lock on user:123:profile
  2. GET user:123:profile
  3. Update
  4. SET user:123:profile {updated}
  5. RELEASE lock
Server 2 (concurrent):
  1. ACQUIRE lock -- BLOCKS
  2. Waits for Server 1
  3. Once released, acquires lock
  4. Proceeds with update
Result: Serialized updates, no race condition

		

Redis Implementation (Redlock):

			
ACQUIRE lock:
  SET user:123:lock "server-2-uuid" EX 10 NX
  (Set if not exists, expire in 10 seconds)
If someone else has lock:
  SET fails, server waits/retries
RELEASE lock:
  IF lock_value == "server-2-uuid":
    DEL user:123:lock
  (Compare before delete)

		

Pros: Strong consistency
Cons: Blocking, potential deadlocks, performance hit.

Solution 4: Event Sourcing

Instead of storing current state, store immutable events:

			
User Request 1: Change name to "Jane"
  Event: {"type": "NameChanged", "name": "Jane", ts: t1}
  APPEND to user:123:events
User Request 2: Change email
  Event: {"type": "EmailChanged", "email": "jane@...", ts: t2}
  APPEND to user:123:events
Rebuilding state (any server):
  Read all events: [NameChanged, EmailChanged]
  Replay: Start with {}, apply NameChanged, apply EmailChanged
  Result: {name: "Jane", email: "jane@..."}

		

Pros: No race conditions, full audit trail
Cons: Complex, state reconstruction overhead.

Comparison Table:

Solution	Complexity	Performance	Consistency
Sticky Sessions	Low	High	Strong
Versioning + CAS	Medium	Good	Eventual (with retries)
Distributed Locks	Medium	Low (blocking)	Strong
Event Sourcing	High	Medium	Eventual (by replay)
Stateless Design	High	High	N/A (no state)

Best Practice Recommendation:

Use sticky sessions (simplest) if app tolerates server failures
Use stateless design (best practice) if possible—store state in DB, cache read-only
Use CAS/versioning if cache updates necessary, conflict rate low
Use locks only if serialization required (financial transactions)

Tips and Key Takeaways

When discussing load balancers in system design interviews:

1. Always Design for Redundancy

No single point of failure
Deploy load balancers in pairs with automatic failover
Monitor health continuously

2. Choose the Right Layer

ALB for web applications, intelligent routing
NLB for low-latency, high-throughput scenarios

3. Handle State Properly

Sticky sessions for simple cases
External cache (Redis) for scale
Stateless design (JWT) for distributed systems

4. Account for Regional Growth

Global load balancer with health checks
Regional failover documented and tested
Database replication strategy defined

5. Measure Before Optimizing

Identify actual bottlenecks (TLS, connections, etc.)
Optimize in priority order

6. Understand Trade-offs

Latency vs. intelligent routing
Scalability vs. consistency
Simplicity vs. resilience

7. Don’t Over-Engineer

Simple algorithms (Round Robin, Least Connections) work for most cases
Complex adaptive algorithms introduce operational complexity
Keep it simple unless you have specific needs

The best load balancer design is one that’s appropriate for your specific use case, not the most complex or feature-rich design.

Big Picture Overview

Core Concepts Explained Simply

What Is a Load Balancer?

The Two Main Types of Load Balancers

Load Balancing Algorithms

Health Checks and Failover

How Load Balancers Integrate into a System

Request Flow

Key Integration Points

TLS Termination

The Problem: Encrypted Traffic

The Solution: Termination at the Load Balancer

Benefits

Trade-off: End-to-End Encryption

Scaling Load Balancers

The Single Load Balancer Problem

Solution: Redundant Load Balancers

How Failover Works

Scaling Horizontally

Global Availability

The Problem: Single Datacenter Limitations

Solution: Geographically Distributed Load Balancers

How Global Load Balancing Selects a Region

Data Replication Considerations

Common Use Cases

Use Case 1: High-Traffic E-Commerce Platform

Use Case 2: Real-Time Gaming Server

Use Case 3: Mobile Banking App (High Availability + Compliance)

Trade-offs and Limitations

What Load Balancers Are Good At

What Load Balancers Are Bad At

Common Mistakes

How This Fits into an Architect’s Mental Model

Three Key Mental Models

Design Decisions

Key Takeaway

Summary

System Design Interview Questions

Overview

Question 1: Design a Load Balancer from Scratch

Question 2: Application Load Balancer vs Network Load Balancer

Question 3: Handle Session State with Load Balancers

Question 4: Load Balancer as Single Point of Failure

Question 5: Geographically Distributed Load Balancing

Question 6: Load Balancer Performance Optimization

Question 7: Load Balancer and Database Connections

Question 8: Load Balancer and Cache Coherence

Tips and Key Takeaways

Share this:

Related

Leave a comment Cancel reply