Under a network partition you cannot have both strong Consistency and Availability. You must choose which to prefer while tolerating partitions.
- Consistency (C): every read sees the latest committed write.
- Availability (A): every request receives a non‑error response.
- Partition tolerance (P): the system continues operating despite message loss or delay.
Because partitions can always happen, practical systems are either CP or AP during a partition.
Examples
- CP: ZooKeeper, etcd (elect a leader; may refuse service to preserve consistency).
- AP: Dynamo, Cassandra (serve reads/writes with reconciliation, accepting staleness).
Beyond the slogan
- CAP says nothing about latency, durability, or SLAs. Many systems offer adjustable consistency via quorums (e.g., R/W quorums with N replicas).
- Client‑centric guarantees (read‑your‑writes, monotonic reads) can give a better UX even in AP systems.
- Multi‑region SQL often chooses CP (synchronous replication) on critical tables and AP (async) on others.
Tuning with quorums
- With N replicas, choose read quorum R and write quorum W such that R + W > N for read‑after‑write consistency.
- Trade‑off: larger quorums increase latency and reduce availability during failures.
Practical guidance
Decide what to do during a partition: fail fast (CP) or serve possibly stale data (AP). For payments/orders, prefer CP; for feeds/caches, AP is usually fine. Measure and simulate partitions to validate behavior.
Runbook snippet
- Detect partition via increased inter‑AZ RTT and replication lag.
- Enter degraded policy: block writes to CP tables; allow reads with stale banners; queue non‑critical writes.
- Exit policy: when quorum restored, reconcile via write‑ahead log or vector‑clock merge.
Code: tunable consistency (Cassandra CQL)
-- language-sql
-- Write with QUORUM, read with QUORUM: strong read-after-write for a single key
CONSISTENCY QUORUM;
INSERT INTO users (id, name) VALUES (uuid(), 'alice');
-- Later
CONSISTENCY QUORUM;
SELECT * FROM users WHERE id = ?;
-- Prefer availability (AP):
CONSISTENCY ONE;
SELECT * FROM users WHERE id = ?; -- may be stale under partition
Analogy
Imagine a group chatting during a storm. If the group insists everyone hears the latest message before replying (CP), conversations slow or pause when lines drop. If they allow replies even if someone missed the last message (AP), the chat continues but may be inconsistent until they reconcile later.
FAQ
- Is CAP about performance? Not directly; it’s about behavior during partitions. Latency concerns are separate but related.
- Can I be CA? In theory without partitions; in practice, assume partitions happen, so you must choose CP or AP under fault.
Try it (fault injection)
# language-bash
tc qdisc add dev eth0 root netem loss 5% delay 200ms
# Run consistency checks and observe behavior; then
tc qdisc del dev eth0 root