Load balancing is the practice of distributing incoming traffic across multiple instances of a service to improve availability, performance, and elasticity. A load balancer is the single entry point that decides which backend should handle each request/connection.
Where it sits (layers)
- Layer 4 (transport): routes based on IP/port/connection (e.g., TCP). Very fast; limited request awareness. Examples: NLB, HAProxy L4.
- Layer 7 (application): routes based on HTTP/gRPC metadata (path, headers, methods), can do retries, timeouts, circuit breaking. Examples: ALB, Envoy, Nginx, service mesh.
Core algorithms
- Round‑robin: simple and fair when instances are homogeneous.
- Least‑connections/latency: better for heterogeneous workloads or long‑lived requests.
- Consistent hashing: stable request routing; great with caches and sharding.
- Weighted: steer more traffic to bigger/faster instances.
Production concerns
- Health checks (active + passive) with fast ejection and slow reintroduction.
- Prefer stateless services; if you must, use sticky sessions (cookie/IP hash) sparingly.
- TLS termination at the LB for uniform policy and observability; use mTLS behind if needed.
- Anycast + global DNS (GSLB) to get users to the closest healthy region.
Configuration tips
- Enable slow‑start so new instances warm caches before receiving full traffic.
- Use connection draining on shutdowns and deployments.
- Separate pools for long‑lived connections (WebSockets) vs short HTTP requests.
Tooling
Nginx/HAProxy/Envoy, cloud LBs (ALB/NLB), and service meshes (Istio/Linkerd) for L7 routing, retries, timeouts, and circuit breaking.
Failure modes and protection
- No backpressure → cascading failures. Enforce concurrency limits and queue caps.
- Missing timeouts/circuit breakers → retry storms and amplified outages.
- Slow start and connection draining avoid thundering herds during deploys.
Handy checklist
- Health checks tuned per endpoint, not one‑size‑fits‑all.
- Per‑route timeouts and retry budgets; circuit breakers with half‑open probes.
- Observability: per‑backend success rate, EWMA latency, error classification.
Code: Envoy weighted routing and outlier detection
# language-yaml
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 8080 }
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match: { prefix: "/" }
route:
cluster: backend_cluster
http_filters:
- name: envoy.filters.http.router
clusters:
- name: backend_cluster
connect_timeout: 0.25s
type: STRICT_DNS
load_assignment:
cluster_name: backend_cluster
endpoints:
- lb_endpoints:
- endpoint: { address: { socket_address: { address: backend-a, port_value: 80 } } }
load_balancing_weight: 80
- endpoint: { address: { socket_address: { address: backend-b, port_value: 80 } } }
load_balancing_weight: 20
outlier_detection:
consecutive_5xx: 5
interval: 5s
base_ejection_time: 30s
max_ejection_percent: 50
Analogy
Think of a busy restaurant with a host. The host (load balancer) seats each new party at a table (backend). If a waiter is overloaded (high latency), the host assigns the next party elsewhere. If a table is broken (unhealthy), the host stops using it until it’s fixed.
Real‑world example
- Spiky traffic during product launch caused p95 latency to spike. Enabling slow‑start and separating WebSocket traffic into its own pool stabilized tail latency and avoided connection starvation for regular HTTP requests.
FAQ
- When do I need sticky sessions? Only if state is on the instance (avoid if possible). Prefer externalizing sessions (Redis) or JWT.
- L4 vs L7? L4 is simpler/faster; L7 understands requests and can retry/time out per route.
- Can retries make things worse? Yes. Use budgets and jitter to avoid retry storms.
Try it (quick check)
# language-bash
# Send 100 requests and see distribution and latency
for i in {1..100}; do curl -s -w "%{time_total} %{remote_ip}\n" -o /dev/null https://your-domain/ & done | sort -n | tail -n 10