Flaky Tests and How to Fix Them
Diagnose and eliminate flaky tests caused by timing, ordering, shared state, and the network. Build a culture that stops flakes at the source.
What you'll learn
- ✓The most common causes of flaky tests
- ✓How to reproduce a flake reliably
- ✓Quarantine vs fix decisions
- ✓Patterns that prevent flakes
- ✓How to track flakiness over time
Prerequisites
- •Comfortable running and reading test suites
What and Why
A flaky test passes sometimes and fails sometimes for the same code. Flakes erode trust in your CI. People start ignoring red builds, then real regressions slip through. Killing flakes is one of the highest-leverage investments a team can make.
Mental Model
Most flakes share a root cause: hidden non-determinism. Time, ordering, parallelism, the network, and shared state are the usual suspects. The cure is to make the test deterministic or to remove the dependency that creates the variation.
See a red build
|
v
Reproduce? --no--> Add logs, run in loop, capture seed
|yes
v
Identify root cause
/ | \
Timing Shared Network/
race state external
| | |
v v v
Use poll Isolate Mock at
not sleep db/files boundary Hands-on Example
A flaky timing test.
// flaky
test("debounce calls once", async () => {
const fn = vi.fn();
const d = debounce(fn, 50);
d(); d(); d();
await new Promise(r => setTimeout(r, 60));
expect(fn).toHaveBeenCalledTimes(1);
});
The 60 ms wait might be too short on a slow CI runner. Fix it with fake timers.
test("debounce calls once", () => {
vi.useFakeTimers();
const fn = vi.fn();
const d = debounce(fn, 50);
d(); d(); d();
vi.advanceTimersByTime(50);
expect(fn).toHaveBeenCalledTimes(1);
vi.useRealTimers();
});
A flake from shared state.
# flaky: writes to a real file path; parallel tests collide
def test_save(tmp_path):
path = "/tmp/data.json" # shared
save(path, {"a": 1})
assert load(path) == {"a": 1}
Fix with a per-test temp directory.
def test_save(tmp_path):
path = tmp_path / "data.json"
save(path, {"a": 1})
assert load(path) == {"a": 1}
A flake from ordering. Two tests share a DB and the first leaves a row behind. Run tests inside a transaction that rolls back, or truncate tables in a fixture. Better, scope tests to ephemeral databases per worker.
A flake from polling instead of asserting.
// flaky
await sleep(500);
expect(await client.get("/job/1")).toMatchObject({ status: "done" });
// stable: poll with timeout
await waitFor(async () => {
const r = await client.get("/job/1");
expect(r.status).toBe("done");
}, { timeout: 5000, interval: 100 });
Common Pitfalls
- Solving flakes by adding
sleep. It hides the problem and slows down CI. - Retrying flaky tests automatically without measuring. The bug stays, masked by retries.
- Mocking time only sometimes. Either control time everywhere a test uses it, or not at all.
- Sharing fixtures that mutate state. Hidden coupling between tests is a top cause of order-dependent flakes.
- Skipping the “reproduce locally” step. If you cannot reproduce it, you cannot prove you fixed it.
Practical Tips
- Run the suspect test in a loop with
vitest --run --repeat 100orpytest --count=100to confirm a fix. - Tag known flakes with a label like
@flakyand track their resolution like bugs. - Snapshot the seed for randomized tests so failures can be replayed.
- Mock at the network boundary so tests do not depend on external services.
- Track flake rate per file in CI. The trend matters more than any single failure.
Wrap-up
Flaky tests are a signal that something in the system has hidden non-determinism. Resist the urge to retry or skip. Track flakes, reproduce them, and fix the root cause with deterministic time, isolated state, and boundary-level mocking. A trustworthy test suite pays back the effort every single day for the rest of the project.
Related articles
- Testing Testing Pyramid: Unit, Integration, E2E
What the testing pyramid actually means in modern apps, when to deviate, and how to keep each layer giving you the value it is supposed to.
- Testing Property-Based Testing: An Introduction
Stop writing one example per test. Property-based testing generates inputs for you and finds the edge cases you would never think to write.
- Testing Contract Tests Explained: Catching Integration Bugs Early
Understand consumer-driven contract testing, how it differs from integration tests, and how tools like Pact prevent breaking API changes between services.
- Testing Test Coverage Metrics and Their Pitfalls
Line, branch, and mutation coverage explained. Learn what each metric tells you, what it hides, and how to use coverage without gaming it.