Skip to content
C Codeloom
C++

C++ Memory Model and Atomics

Understand the C++ memory model, memory orderings, and how std::atomic enables correct lock-free programming across threads.

·3 min read · By Codeloom
Advanced 11 min read

What you'll learn

  • Why a memory model exists
  • std::atomic basics
  • memory_order semantics
  • Release/acquire pattern
  • Common bugs

Prerequisites

  • Basic familiarity with C++ threads

What and Why

Before C++11, the language pretended threads didn’t exist. Compilers and CPUs were free to reorder memory operations as long as a single-threaded program behaved correctly. With multithreading, that liberty becomes hostile. C++11 introduced a formal memory model that defines what one thread can observe of another thread’s writes.

std::atomic<T> is the primary tool for cross-thread communication without locks. It guarantees that reads and writes happen as indivisible operations and lets you constrain reordering through memory orderings.

Mental Model

Three layers reorder your memory operations: the compiler, the CPU, and the cache coherence protocol. Each std::atomic operation acts as a fence that limits what reorderings are legal around it. The memory_order argument controls how strict the fence is:

  • relaxed: only atomicity, no ordering guarantees
  • acquire/release: pairwise synchronization
  • acq_rel: both
  • seq_cst: total global order across all threads (default and safest)
Thread A:                Thread B:
data = 42;               while (!ready.load(acquire)) {}
ready.store(true,        // sees data == 42
          release);    use(data);
Release-acquire synchronization

The release on A “publishes” all prior writes to whoever performs a matching acquire load on the same atomic.

Hands-on Example

A simple spinlock built on std::atomic_flag:

#include <atomic>
#include <thread>

class SpinLock {
  std::atomic_flag flag = ATOMIC_FLAG_INIT;
public:
  void lock() {
    while (flag.test_and_set(std::memory_order_acquire)) {
      // busy wait
    }
  }
  void unlock() {
    flag.clear(std::memory_order_release);
  }
};

A safer pattern: publish a pointer once it’s fully initialized.

std::atomic<Config*> g_config{nullptr};

void publish(Config* c) {
  // build c fully first
  g_config.store(c, std::memory_order_release);
}

const Config* current() {
  return g_config.load(std::memory_order_acquire);
}

The acquire load on the reader sees every write that happened before the release on the writer.

Common Pitfalls

Defaulting to relaxed for performance. relaxed only guarantees indivisibility. Without acquire/release, the reader may see the new pointer but stale fields it points to. Use it only for counters and statistics.

Assuming volatile is enough. volatile was designed for memory-mapped I/O. It prevents some compiler optimizations but provides zero cross-thread ordering. Never use volatile for thread sync in C++.

Double-checked locking without atomics. The classic broken pattern. If the singleton pointer isn’t atomic, another thread can observe a non-null pointer to a partially constructed object.

Mixing atomic and non-atomic access to the same variable produces a data race, which is undefined behavior, full stop.

Practical Tips

  • Default to memory_order_seq_cst. It’s the easiest to reason about. Only relax orderings after profiling shows real cost.
  • Pair release stores with acquire loads on the same atomic; that’s the dominant correctness pattern.
  • For shared counters with no dependent data, fetch_add(1, relaxed) is correct and fast.
  • Prefer std::shared_ptr (atomic ref counts) or std::mutex for most use cases. Hand-rolled lock-free code is hard to get right.
  • Use tools: ThreadSanitizer (-fsanitize=thread) catches many ordering bugs you can’t catch by inspection.

Wrap-up

The C++ memory model is dense but learnable. Internalize three patterns: release/acquire for one-way publication, sequential consistency when you want total order, and relaxed only for independent counters. Pair every cross-thread variable with a synchronization mechanism, atomic or otherwise. Lock-free code is rewarding when you need it, but reach for mutexes first; correctness is more valuable than nanoseconds.