The ABA Problem in Concurrent Programming Explained

The ABA problem occurs when a thread reads a value A from shared memory, gets preempted, another thread changes the value to B and then back to A, and the first thread resumes and incorrectly concludes nothing changed. Compare-and-swap operations see A and succeed, even though the underlying state has changed in ways that invalidate the first thread’s assumptions. It produces no crash, no error, and no detectable signal. Just silently wrong behavior.

Pithy Cyborg | AI FAQs – The Details

Question: What is the ABA problem in concurrent programming, and why is it so hard to detect?

Asked by: Claude Sonnet 4.6

Answered by: Mike D (MrComputerScience)

From Pithy Cyborg | AI News Made Simple

And Pithy Security | Cybersecurity News

How Compare-and-Swap Creates the ABA Vulnerability

Lock-free data structures use compare-and-swap (CAS) as their fundamental primitive. CAS takes three arguments: a memory address, an expected value, and a new value. It atomically checks whether the address holds the expected value and, if so, writes the new value. If the check fails, CAS returns false and the caller retries. This is the basis of lock-free stacks, queues, and linked lists used throughout high-performance concurrent software.

The ABA problem arises because CAS checks value equality, not identity. It cannot tell whether the value at an address has been changed and changed back since the last read.

The canonical example involves a lock-free stack. Thread 1 reads the stack head, which points to node A. Thread 1 prepares a CAS to pop A, replacing the head pointer with A->next, which points to B. Before Thread 1 executes the CAS, it is preempted.

Thread 2 runs, pops A, pops B, frees both, then pushes a new node that the allocator happens to place at the same memory address as A. The stack head now points to what looks like A but is a completely different node with a different next pointer. Thread 1 resumes, executes CAS(head, A, B), and the CAS succeeds because the head pointer still reads as A. Thread 1 has just set the stack head to B, which was freed by Thread 2. The stack is now corrupt. No assertion fired. No sanitizer flagged it. The memory addresses matched perfectly.

Why the ABA Problem Is Nearly Invisible in Testing

The ABA problem requires a specific interleaving of three thread actions: read, change-and-change-back by another thread, and CAS. In practice this interleaving is rare, which is exactly what makes it dangerous.

Under normal testing conditions, the bug may never manifest. Unit tests run in isolation. Integration tests rarely stress the precise timing window. Even load tests may run for hours without triggering the interleaving. When the bug does appear in production, it manifests as memory corruption or use-after-free many operations later, far from the original ABA event. The symptom has no obvious causal connection to the root cause.

ThreadSanitizer detects data races but the ABA problem is not a data race in the technical sense. All memory accesses are atomic. There is no unsynchronized concurrent access to the same variable. TSan will not flag it. AddressSanitizer detects use-after-free, but only if the freed memory is actually accessed after the ABA corruption, and only if the allocator has not recycled the address in a way that looks valid. The combination of atomic operations and address reuse makes the bug invisible to the standard tool chain that catches most concurrency errors.

Detecting DCOM lateral movement faces a structurally similar detection challenge: the malicious activity looks like legitimate behavior at each individual observation point, and only the sequence of events reveals the problem. The ABA problem is the concurrency equivalent. Each individual CAS operation is correct, only the sequence across threads reveals the corruption.

The Standard Solutions and Their Tradeoffs

Several techniques address the ABA problem, each with different costs and applicability.

Tagged pointers are the most widely used solution. Instead of storing just a pointer, you store a pointer plus a monotonically increasing version counter in the same atomic word. On 64-bit systems, the upper bits of a pointer are often unused, allowing a tag to be packed into the same 64-bit value. The CAS now checks both the pointer and the tag. Even if a node is allocated at the same address as a previously freed node, the tag will have incremented, causing the CAS to fail correctly. Java’s AtomicStampedReference and AtomicMarkableReference implement this pattern. The tradeoff is that the tag will eventually overflow for extremely long-running systems, though with a 16-bit tag and one increment per operation this takes billions of operations.

Hazard pointers prevent the ABA problem by deferring memory reclamation. Before a thread accesses a node, it publishes the node’s address in a per-thread hazard pointer record visible to all threads. Before any thread frees a node, it checks all hazard pointer records. If the node’s address appears in any record, it is deferred rather than freed. A node that cannot be freed cannot be reallocated at the same address, which eliminates the address-reuse precondition for ABA. Hazard pointers are used in production in Meta’s Folly library and in several high-performance queue implementations. The cost is scan overhead on every reclamation and memory usage proportional to the number of threads times the number of hazard pointer slots per thread.

Epoch-based reclamation, used in Rust’s crossbeam library, divides time into epochs and defers reclamation until all threads have passed through a grace period since the free. This achieves lower overhead than hazard pointers on typical workloads and is the preferred approach in the Rust lock-free ecosystem. The tradeoff is that a stalled thread can delay memory reclamation indefinitely, causing memory growth under certain failure modes.

What This Means For You

Never implement lock-free data structures from scratch without understanding ABA. Use battle-tested implementations from crossbeam (Rust), java.util.concurrent (Java), or Folly (C++) that already handle ABA correctly.
Use tagged pointers or AtomicStampedReference when implementing custom CAS-based algorithms that involve pointer manipulation. The overhead is negligible and it eliminates the most common ABA scenario.
Do not rely on ThreadSanitizer alone to validate lock-free code. TSan will not detect ABA and you need either formal reasoning, model checking with tools like TLA+, or extensive stress testing with address randomization to gain confidence.
Prefer epoch-based reclamation over manual memory management in lock-free Rust code. Crossbeam’s epoch implementation is correct, well-tested, and handles the reclamation problem without the overhead of per-access hazard pointer registration.
Test lock-free code under artificial preemption injection. Tools that randomly delay threads at specific points can increase the probability of triggering rare interleavings, making ABA bugs surface in testing rather than production.

Pithy Cyborg | AI News Made Simple

Subscribe (Free): https://pithycyborg.substack.com/subscribe

Read archives (Free): https://pithycyborg.substack.com/archive

Pithy Security | Cybersecurity News

Subscribe (Free): https://pithysecurity.substack.com/subscribe

Read archives (Free): https://pithysecurity.substack.com/archive