Test Coverage Metrics and Their Pitfalls

Intermediate 7 min read

What you'll learn

✓What line, branch, and path coverage mean
✓Why high coverage can still hide bugs
✓How mutation testing fills the gap
✓Setting reasonable coverage targets
✓Avoiding coverage-driven gaming

Prerequisites

•Familiar with testing concepts

What and Why

Coverage measures how much of your code your tests touch. It is appealing because it produces a single number, and numbers are easy to put in dashboards. But coverage is a symptom, not the disease. A codebase with 95% coverage and useless assertions is worse than one with 60% coverage and sharp tests — the first lies to you about safety.

The point of this article is not to abandon coverage. It is to understand what each metric actually says, so you can use it as a signal rather than a target.

Mental Model

Three kinds of coverage matter:

Line coverage: did any test execute this line? Cheap, easy, misleading.
Branch coverage: did every if/else go down both paths? Stronger.
Mutation coverage: if I change > to >=, does any test fail? The strongest — it measures whether tests would catch a bug, not just whether they executed code.

A line can be covered by tests with zero assertions about its behavior. Branch coverage helps. Mutation coverage is the gold standard but is slow and noisy.

Hands-on Example

Consider this function:

function discount(price: number, isMember: boolean) {
  if (isMember && price > 100) return price * 0.9;
  return price;
}

A test like expect(discount(50, false)).toBe(50) gives 100% line coverage of the return price path but says nothing about the discount branch. Branch coverage would force you to test both if outcomes. Mutation testing goes further — change > to >= and see if any test fails when price === 100.

        Mutation coverage
            ^
            |  catches logic flaws
      Branch coverage
            ^
            |  forces both paths
      Line coverage
            ^
            |  just touched it
      No coverage

Coverage strength hierarchy

Tools like Stryker (JS), PIT (Java), and mutmut (Python) automate mutation testing. They take longer to run, so most teams use them on critical modules rather than the whole codebase.

Common Pitfalls

Treating coverage as a quality bar: 100% line coverage with expect(true).toBe(true) is worthless and demoralizing.
Enforcing identical targets everywhere: a payment module deserves higher coverage than a logging helper. Use per-package thresholds.
Counting generated code: serializers, migrations, and config dumps inflate denominators. Exclude them via config.
Letting coverage drive what gets tested: tests should follow risk, not the diff in the coverage report.
Ignoring assertion quality: a test that runs the code but never expects anything passes silently and counts as coverage.

Practical Tips

Pick a realistic threshold — 70 to 80% line coverage is plenty for most apps — and enforce it as a CI floor that cannot drop, rather than a goal to climb. Track new code coverage on PRs; that is where bugs enter. Run mutation testing on the riskiest modules (auth, billing, data integrity) on a nightly schedule, not on every commit. Review tests as carefully as production code; a covered line with a weak assertion is a future bug. And periodically run the suite with one assertion deleted at random — if everything still passes, you have a problem.

Wrap-up

Coverage is a flashlight, not a verdict. It shows you where tests do not look, which is useful. It cannot tell you whether the tests that exist are any good. Combine line and branch coverage as a baseline, layer mutation testing on critical paths, and review tests for assertion quality. Treat the number as a conversation starter — not the conversation itself.