*Load Balancing

September 15, 2025

Load balancing is the practice of distributing incoming traffic across multiple instances of a service to improve availability, performance, and elasticity. A load balancer is the single entry point that decides which backend should handle each request/connection.

Where it sits (layers)

  • Layer 4 (transport): routes based on IP/port/connection (e.g., TCP). Very fast; limited request awareness. Examples: NLB, HAProxy L4.
  • Layer 7 (application): routes based on HTTP/gRPC metadata (path, headers, methods), can do retries, timeouts, circuit breaking. Examples: ALB, Envoy, Nginx, service mesh.

Core algorithms

  • Round‑robin: simple and fair when instances are homogeneous.
  • Least‑connections/latency: better for heterogeneous workloads or long‑lived requests.
  • Consistent hashing: stable request routing; great with caches and sharding.
  • Weighted: steer more traffic to bigger/faster instances.

Production concerns

  • Health checks (active + passive) with fast ejection and slow reintroduction.
  • Prefer stateless services; if you must, use sticky sessions (cookie/IP hash) sparingly.
  • TLS termination at the LB for uniform policy and observability; use mTLS behind if needed.
  • Anycast + global DNS (GSLB) to get users to the closest healthy region.

Configuration tips

  • Enable slow‑start so new instances warm caches before receiving full traffic.
  • Use connection draining on shutdowns and deployments.
  • Separate pools for long‑lived connections (WebSockets) vs short HTTP requests.

Tooling

Nginx/HAProxy/Envoy, cloud LBs (ALB/NLB), and service meshes (Istio/Linkerd) for L7 routing, retries, timeouts, and circuit breaking.

Failure modes and protection

  • No backpressure → cascading failures. Enforce concurrency limits and queue caps.
  • Missing timeouts/circuit breakers → retry storms and amplified outages.
  • Slow start and connection draining avoid thundering herds during deploys.

Handy checklist

  • Health checks tuned per endpoint, not one‑size‑fits‑all.
  • Per‑route timeouts and retry budgets; circuit breakers with half‑open probes.
  • Observability: per‑backend success rate, EWMA latency, error classification.

Code: Envoy weighted routing and outlier detection

# language-yaml
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 8080 }
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend
              domains: ["*"]
              routes:
              - match: { prefix: "/" }
                route:
                  cluster: backend_cluster
          http_filters:
          - name: envoy.filters.http.router
  clusters:
  - name: backend_cluster
    connect_timeout: 0.25s
    type: STRICT_DNS
    load_assignment:
      cluster_name: backend_cluster
      endpoints:
      - lb_endpoints:
        - endpoint: { address: { socket_address: { address: backend-a, port_value: 80 } } }
          load_balancing_weight: 80
        - endpoint: { address: { socket_address: { address: backend-b, port_value: 80 } } }
          load_balancing_weight: 20
    outlier_detection:
      consecutive_5xx: 5
      interval: 5s
      base_ejection_time: 30s
      max_ejection_percent: 50

Analogy

Think of a busy restaurant with a host. The host (load balancer) seats each new party at a table (backend). If a waiter is overloaded (high latency), the host assigns the next party elsewhere. If a table is broken (unhealthy), the host stops using it until it’s fixed.

Real‑world example

  • Spiky traffic during product launch caused p95 latency to spike. Enabling slow‑start and separating WebSocket traffic into its own pool stabilized tail latency and avoided connection starvation for regular HTTP requests.

FAQ

  • When do I need sticky sessions? Only if state is on the instance (avoid if possible). Prefer externalizing sessions (Redis) or JWT.
  • L4 vs L7? L4 is simpler/faster; L7 understands requests and can retry/time out per route.
  • Can retries make things worse? Yes. Use budgets and jitter to avoid retry storms.

Try it (quick check)

# language-bash
# Send 100 requests and see distribution and latency
for i in {1..100}; do curl -s -w "%{time_total} %{remote_ip}\n" -o /dev/null https://your-domain/ & done | sort -n | tail -n 10