Flaky Tests and How to Fix Them

Intermediate 9 min read

What you'll learn

✓The most common causes of flaky tests
✓How to reproduce a flake reliably
✓Quarantine vs fix decisions
✓Patterns that prevent flakes
✓How to track flakiness over time

Prerequisites

•Comfortable running and reading test suites

What and Why

A flaky test passes sometimes and fails sometimes for the same code. Flakes erode trust in your CI. People start ignoring red builds, then real regressions slip through. Killing flakes is one of the highest-leverage investments a team can make.

Mental Model

Most flakes share a root cause: hidden non-determinism. Time, ordering, parallelism, the network, and shared state are the usual suspects. The cure is to make the test deterministic or to remove the dependency that creates the variation.

See a red build
   |
   v
Reproduce? --no--> Add logs, run in loop, capture seed
   |yes
   v
Identify root cause
 /        |          \
Timing   Shared      Network/
race     state       external
 |        |           |
 v        v           v
Use poll  Isolate     Mock at
not sleep db/files    boundary

Flaky test triage

Hands-on Example

A flaky timing test.

// flaky
test("debounce calls once", async () => {
  const fn = vi.fn();
  const d = debounce(fn, 50);
  d(); d(); d();
  await new Promise(r => setTimeout(r, 60));
  expect(fn).toHaveBeenCalledTimes(1);
});

The 60 ms wait might be too short on a slow CI runner. Fix it with fake timers.

test("debounce calls once", () => {
  vi.useFakeTimers();
  const fn = vi.fn();
  const d = debounce(fn, 50);
  d(); d(); d();
  vi.advanceTimersByTime(50);
  expect(fn).toHaveBeenCalledTimes(1);
  vi.useRealTimers();
});

A flake from shared state.

# flaky: writes to a real file path; parallel tests collide
def test_save(tmp_path):
    path = "/tmp/data.json"  # shared
    save(path, {"a": 1})
    assert load(path) == {"a": 1}

Fix with a per-test temp directory.

def test_save(tmp_path):
    path = tmp_path / "data.json"
    save(path, {"a": 1})
    assert load(path) == {"a": 1}

A flake from ordering. Two tests share a DB and the first leaves a row behind. Run tests inside a transaction that rolls back, or truncate tables in a fixture. Better, scope tests to ephemeral databases per worker.

A flake from polling instead of asserting.

// flaky
await sleep(500);
expect(await client.get("/job/1")).toMatchObject({ status: "done" });

// stable: poll with timeout
await waitFor(async () => {
  const r = await client.get("/job/1");
  expect(r.status).toBe("done");
}, { timeout: 5000, interval: 100 });

Common Pitfalls

Solving flakes by adding sleep. It hides the problem and slows down CI.
Retrying flaky tests automatically without measuring. The bug stays, masked by retries.
Mocking time only sometimes. Either control time everywhere a test uses it, or not at all.
Sharing fixtures that mutate state. Hidden coupling between tests is a top cause of order-dependent flakes.
Skipping the “reproduce locally” step. If you cannot reproduce it, you cannot prove you fixed it.

Practical Tips

Run the suspect test in a loop with vitest --run --repeat 100 or pytest --count=100 to confirm a fix.
Tag known flakes with a label like @flaky and track their resolution like bugs.
Snapshot the seed for randomized tests so failures can be replayed.
Mock at the network boundary so tests do not depend on external services.
Track flake rate per file in CI. The trend matters more than any single failure.

Wrap-up

Flaky tests are a signal that something in the system has hidden non-determinism. Resist the urge to retry or skip. Track flakes, reproduce them, and fix the root cause with deterministic time, isolated state, and boundary-level mocking. A trustworthy test suite pays back the effort every single day for the rest of the project.