CI/CD Pipeline Caching Techniques
Speed up CI builds with dependency caches, layer caches, remote build caches, and content-addressed storage. Learn what to cache and what to skip.
What you'll learn
- ✓Which CI steps benefit from caching
- ✓How cache keys and restore keys work
- ✓How to cache Docker layers in CI
- ✓Why remote build caches beat local ones
- ✓How to avoid cache poisoning
Prerequisites
- •Familiar with shell
- •Used a CI provider
What and Why
A CI pipeline does the same things over and over: install dependencies, compile code, build container images, run tests. Most of those inputs do not change between runs. Caching is how you teach the pipeline to recognize that and reuse previous outputs.
Done right, caching turns a 10 minute pipeline into a 90 second one. Done wrong, it ships stale code to production. The difference is in how you compute cache keys.
Mental Model
A cache is a key-value store. The key is a hash of the inputs that, if unchanged, guarantee the output is unchanged. The value is whatever artifact you do not want to recompute: node_modules, a Maven ~/.m2, a Bazel action cache, a Docker layer.
If the key is too strict (e.g. the commit SHA), you almost never hit. If the key is too loose (e.g. just the branch name), you hit stale results.
[step starts]
|
v
compute key = hash(inputs)
|
v
+-----------+
| exact hit |---> use artifact
+-----------+
| miss
v
+--------------+
| restore-keys |---> partial hit, warm start
+--------------+
| miss
v
run from scratch, save under key
Hands-on Example
In GitHub Actions, caching node_modules looks like this:
- uses: actions/cache@v4
with:
path: ~/.npm
key: npm-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
restore-keys: |
npm-${{ runner.os }}-
The primary key is bound to the lockfile hash. If the lockfile changes, you miss, but the restore-keys line lets npm install start from the previous cache, downloading only the new packages.
For Docker, use BuildKit’s inline cache:
docker buildx build \
--cache-to type=registry,ref=ghcr.io/me/app:cache,mode=max \
--cache-from type=registry,ref=ghcr.io/me/app:cache \
-t ghcr.io/me/app:$(git rev-parse --short HEAD) \
--push .
mode=max exports every layer, not just the final one. The next build pulls the cache manifest and skips layers whose inputs match.
For language toolchains:
- uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: cargo-${{ hashFiles('Cargo.lock') }}-${{ hashFiles('rust-toolchain.toml') }}
+--------------------+
| dep manager cache | ~/.npm, ~/.m2, ~/.cargo
+--------------------+
| compile cache | target/, dist/, .turbo
+--------------------+
| docker layer cache | registry / GHA cache
+--------------------+
| test result cache | jest --onlyChanged
+--------------------+
Common Pitfalls
Caching node_modules directly rather than ~/.npm can mix architectures across runners. The download cache is portable, the install tree is not.
Branch-scoped caches isolate every feature branch from main. If your CI provider scopes caches by branch by default, set the restore keys explicitly so feature branches can fall back to main caches.
Time-bombing yourself with unbounded caches. Caches grow forever unless you set retention or evict by size. Docker layer caches in particular can balloon past 50 GB.
Trusting caches across security-sensitive boundaries. Forks should never write to your trusted cache, and pull request runs should read but not poison shared keys.
Forgetting that source code is also input. hashFiles('package-lock.json') is fine for dependency installs but not for a compile cache. For that you need the source tree hash too.
Practical Tips
Measure before optimizing. Add timing output to each step (time npm ci, time npm run build) and rank by wall time. Only cache steps that take more than 30 seconds.
Use content-addressed storage where you can. Tools like Bazel, Nx, and Turborepo identify outputs by the hash of all inputs (sources, deps, env vars). A remote cache backed by S3 or GCS gives every developer and CI runner shared cache hits.
For Docker, structure your Dockerfile so the things that change least are at the top:
FROM node:20
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
The npm ci layer is reused unless the lockfile changes, so editing source files only invalidates the bottom two layers.
Cache test results when feasible. pytest --cache-clear is the opposite of what you want; tools like jest --onlyChanged or bazel test only run tests whose inputs changed.
Wrap-up
Caching is the single highest-leverage CI optimization. Hash inputs precisely, use restore keys for warm starts, and put a remote cache behind shared work. Avoid leaking caches across branches or trust boundaries, and prune by retention. With layered caches for dependencies, builds, container layers, and tests, even large pipelines stay under 5 minutes.
Related articles
- CI/CD CI/CD Deployment Strategies Overview
Compare rolling, blue/green, canary, shadow, and feature flag deployments. Learn when to pick each strategy and the trade-offs in risk and cost.
- CI/CD CI/CD Monorepo Strategies That Scale
Learn how to design CI/CD pipelines for monorepos using affected detection, build graphs, and caching to keep builds fast as the repo grows.
- Django Django Caching Strategies
Compare per-view, template fragment, low-level, and per-site caching in Django and learn when each pays off.
- LLMs LLM Prompt Caching Deep Dive
How prompt caching works in modern LLM APIs, when it saves significant cost and latency, and how to design prompts so the cache actually hits in production.