A network partition is when nodes in a distributed system can no longer communicate with each other. The nodes are not down, the network between them is broken. Every distributed system must choose what to do when this happens: keep serving requests and risk returning inconsistent data, or refuse to serve requests until the partition heals. There is no third option.
Analysis Briefing
- Topic: Network partitions, the CAP theorem, and Raft consensus
- Analyst: Mike D (@MrComputerScience)
- Context: A structured investigation kicked off by Claude Sonnet 4.6
- Source: Pithy Cyborg | AI News Made Simple
- Key Question: How does Raft prevent two nodes from both thinking they are the leader during a partition?
The CAP Theorem and the Only Real Choice
The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (all nodes see the same data at the same time), Availability (every request receives a response), and Partition Tolerance (the system continues operating despite network partitions).
Partition tolerance is not optional. Networks partition. Hardware fails. Cables get cut. A system that assumes a reliable network will produce incorrect results when the network misbehaves. The real choice is between CP (consistency over availability during partitions) and AP (availability over consistency during partitions).
CP systems refuse to serve requests when they cannot confirm they have the latest data. A Raft-based system with 3 nodes and a partition isolating 1 node from the other 2 will continue operating on the 2-node side (quorum maintained) and stop serving writes on the isolated node (no quorum). No split-brain.
AP systems serve requests from every partition, accepting that different partitions may diverge. Cassandra at consistency level ONE serves reads and writes from both sides of a partition. Both sides accept writes. When the partition heals, the system must reconcile the divergence using last-write-wins or application-level merge logic.
How Raft Prevents Split-Brain with Leader Election
Raft is a consensus algorithm designed to be more understandable than Paxos while providing identical safety guarantees. It elects a single leader who handles all writes. Followers replicate the leader’s log. Safety comes from the quorum requirement: the leader must receive acknowledgment from a majority of nodes before committing any log entry.
Leader election uses randomized timeouts. Each follower waits a random period (typically 150 to 300ms) before starting an election. The first follower to time out sends a RequestVote RPC to all other nodes. A node grants its vote only once per term and only to candidates whose log is at least as up-to-date as the voter’s.
A candidate becomes leader only if it receives votes from a majority. With 5 nodes, this requires 3 votes. During a partition that splits the cluster 3/2, only the majority side can elect a leader. The minority side will keep attempting elections but never reach quorum, so it will never accept writes.
Normal operation (5 nodes):
Leader → Follower 1 ✓
Leader → Follower 2 ✓
Leader → Follower 3 ✓
Leader → Follower 4 ✓
Quorum achieved, commit.
Partition (3/2 split):
Majority side (3 nodes): Minority side (2 nodes):
Leader → Follower 1 ✓ Candidate → Old Follower 3
Leader → Follower 2 ✓ Candidate → Old Follower 4
Quorum achieved. No quorum. Elections fail.
Continues serving writes. Refuses writes. Serves stale reads.
The term number prevents old leaders from causing problems after partitions heal. Any node with a lower term immediately steps down when it encounters a node with a higher term.
What Actually Happens to In-Flight Requests During a Partition
When a partition occurs mid-request, the client does not immediately know whether its request succeeded. A write that reached the leader before the partition may or may not have been committed and replicated to a quorum before the leader lost connectivity.
The client’s timeout fires and it must decide whether to retry. If the write was committed, retrying without idempotency handling produces a duplicate. If the write was not committed, not retrying loses the data.
The correct pattern is idempotent operations with client-generated identifiers. The client includes a unique request ID with every write. The server deduplicates on this ID. A retry of a committed write is recognized and returns success without re-applying the operation.
etcd, CockroachDB, and TiKV all use Raft internally. Kubernetes uses etcd as its consensus-backed store for cluster state precisely because etcd’s CP guarantee means a partition will never produce two Kubernetes controllers both believing they are authoritative.
What This Means For You
- Make all write operations idempotent with client-generated IDs before deploying any distributed system to production, because partition behavior makes at-least-once delivery the realistic guarantee and your application must handle duplicate delivery correctly.
- Run a 3-node minimum cluster for Raft-based systems, because 2-node clusters cannot form a quorum after any single node failure and effectively become single points of failure with extra complexity.
- Test partition behavior explicitly using tools like Toxiproxy or tc (Linux traffic control) to simulate network partitions in staging, because partition handling code paths are almost never exercised in normal testing and almost always contain bugs.
- Choose CP or AP deliberately based on your data semantics, because a financial ledger where split-brain produces double spends requires CP, and a social media feed where briefly inconsistent counts are acceptable can use AP for better availability.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg | AI News Made Simple → AI news made simple without hype.
