CI/CD Pipeline Caching Techniques

Intermediate 9 min read

What you'll learn

✓Which CI steps benefit from caching
✓How cache keys and restore keys work
✓How to cache Docker layers in CI
✓Why remote build caches beat local ones
✓How to avoid cache poisoning

Prerequisites

•Familiar with shell
•Used a CI provider

What and Why

A CI pipeline does the same things over and over: install dependencies, compile code, build container images, run tests. Most of those inputs do not change between runs. Caching is how you teach the pipeline to recognize that and reuse previous outputs.

Done right, caching turns a 10 minute pipeline into a 90 second one. Done wrong, it ships stale code to production. The difference is in how you compute cache keys.

Mental Model

A cache is a key-value store. The key is a hash of the inputs that, if unchanged, guarantee the output is unchanged. The value is whatever artifact you do not want to recompute: node_modules, a Maven ~/.m2, a Bazel action cache, a Docker layer.

If the key is too strict (e.g. the commit SHA), you almost never hit. If the key is too loose (e.g. just the branch name), you hit stale results.


 [step starts]
      |
      v
 compute key = hash(inputs)
      |
      v
 +-----------+
 | exact hit |---> use artifact
 +-----------+
      | miss
      v
 +--------------+
 | restore-keys |---> partial hit, warm start
 +--------------+
      | miss
      v
 run from scratch, save under key

Cache lookup with primary and restore keys

Hands-on Example

In GitHub Actions, caching node_modules looks like this:

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: npm-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
    restore-keys: |
      npm-${{ runner.os }}-

The primary key is bound to the lockfile hash. If the lockfile changes, you miss, but the restore-keys line lets npm install start from the previous cache, downloading only the new packages.

For Docker, use BuildKit’s inline cache:

docker buildx build \
  --cache-to type=registry,ref=ghcr.io/me/app:cache,mode=max \
  --cache-from type=registry,ref=ghcr.io/me/app:cache \
  -t ghcr.io/me/app:$(git rev-parse --short HEAD) \
  --push .

mode=max exports every layer, not just the final one. The next build pulls the cache manifest and skips layers whose inputs match.

For language toolchains:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cargo/registry
      ~/.cargo/git
      target
    key: cargo-${{ hashFiles('Cargo.lock') }}-${{ hashFiles('rust-toolchain.toml') }}


 +--------------------+
 | dep manager cache  |  ~/.npm, ~/.m2, ~/.cargo
 +--------------------+
 | compile cache      |  target/, dist/, .turbo
 +--------------------+
 | docker layer cache |  registry / GHA cache
 +--------------------+
 | test result cache  |  jest --onlyChanged
 +--------------------+

Cache layers in a typical CI pipeline

Common Pitfalls

Caching node_modules directly rather than ~/.npm can mix architectures across runners. The download cache is portable, the install tree is not.

Branch-scoped caches isolate every feature branch from main. If your CI provider scopes caches by branch by default, set the restore keys explicitly so feature branches can fall back to main caches.

Time-bombing yourself with unbounded caches. Caches grow forever unless you set retention or evict by size. Docker layer caches in particular can balloon past 50 GB.

Trusting caches across security-sensitive boundaries. Forks should never write to your trusted cache, and pull request runs should read but not poison shared keys.

Forgetting that source code is also input. hashFiles('package-lock.json') is fine for dependency installs but not for a compile cache. For that you need the source tree hash too.

Practical Tips

Measure before optimizing. Add timing output to each step (time npm ci, time npm run build) and rank by wall time. Only cache steps that take more than 30 seconds.

Use content-addressed storage where you can. Tools like Bazel, Nx, and Turborepo identify outputs by the hash of all inputs (sources, deps, env vars). A remote cache backed by S3 or GCS gives every developer and CI runner shared cache hits.

For Docker, structure your Dockerfile so the things that change least are at the top:

FROM node:20
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

The npm ci layer is reused unless the lockfile changes, so editing source files only invalidates the bottom two layers.

Cache test results when feasible. pytest --cache-clear is the opposite of what you want; tools like jest --onlyChanged or bazel test only run tests whose inputs changed.

Wrap-up

Caching is the single highest-leverage CI optimization. Hash inputs precisely, use restore keys for warm starts, and put a remote cache behind shared work. Avoid leaking caches across branches or trust boundaries, and prune by retention. With layered caches for dependencies, builds, container layers, and tests, even large pipelines stay under 5 minutes.