CI/CD Rollback Strategies
An overview of rollback strategies in modern CI/CD: redeploy previous, blue-green flip, canary reverse, database-safe rollbacks, and the trade-offs between speed and safety.
What you'll learn
- ✓Why every deploy strategy needs a rollback plan
- ✓The five common rollback patterns
- ✓How database changes constrain rollbacks
- ✓How feature flags shrink the blast radius
- ✓How to choose between speed and safety
Prerequisites
- •Some experience deploying services to production
What and Why
A rollback is the act of reverting a production system to a known-good state after a bad deploy. It sounds simple, but the actual mechanism varies widely. Pushing a Git revert, flipping a load balancer, or rolling back a Kubernetes Deployment are very different operations with very different recovery times.
You need a rollback strategy because every deploy will eventually fail. The question is not whether, but how fast you can return to safety. Teams that practice rollback recover in minutes. Teams that improvise recover in hours.
Mental Model
A rollback is just another deploy where the target version is older than the current one. Everything that makes a deploy work, like artifact storage, immutable builds, and config separation, also makes rollback work. Everything that makes a deploy fragile, like in-place edits or coupling code with database state, also makes rollback fragile.
There are five common patterns, ordered roughly from slowest to fastest:
- Redeploy previous artifact through the pipeline.
kubectl rollout undoor equivalent in-place reversion.- Blue-green flip back to the idle environment.
- Canary reverse, returning the traffic split to zero.
- Feature flag off, leaving code deployed but inert.
Each strategy assumes the previous version is still runnable, which is the deepest invariant a rollback depends on.
Hands-on Example
Take a Kubernetes Deployment using a blue-green pattern with two ReplicaSets:
apiVersion: v1
kind: Service
metadata: { name: api }
spec:
selector: { app: api, slot: blue } # currently serving
ports: [{ port: 80, targetPort: 8080 }]
You deployed green with a bad release. To roll back:
kubectl patch svc api -p '{"spec":{"selector":{"app":"api","slot":"blue"}}}'
Traffic returns to blue in seconds. The bad green ReplicaSet stays around so you can investigate.
Slow Fast
+--------+--------+--------+--------+--------+
| Pipeline | kubectl | Blue/ | Canary | Feature |
| redeploy | rollout | green | reverse| flag |
| (minutes)| (~30s) | flip(<5s)| (<5s) | (<1s) |
+--------+--------+--------+--------+--------+
depends on depends on
fresh build pre-deployed code
+ flag service Common Pitfalls
The biggest pitfall is the database. If your release ran a migration that dropped a column or changed a type, rolling the binary back leaves the schema incompatible with the old code. The fix is the expand-contract pattern: ship the additive migration in one release, the code that uses it in a second, and the cleanup migration in a third. Each step is independently rollback-safe.
Another pitfall is rollback drift. The “previous” version was built six weeks ago and depends on a config map you removed yesterday. When you try to roll back, it crashes on startup. Keep config backward compatible for at least one release.
A third is unattended rollback. An automated system rolls back the moment error rate crosses a threshold, then a transient downstream blip causes another rollback, and you flap. Use cooldowns and require a successful health check before re-allowing automatic rollback.
Practical Tips
Practice rollback in non-production. Treat the rollback path as part of every deploy, not an emergency procedure. If you have never run it, you do not have one.
Keep artifacts immutable and addressable. A rollback should be a config change that points at a known image digest, not a rebuild.
Default to feature flags for risky changes. A flag flip is the fastest possible rollback and does not change the running binary.
Separate schema and code releases. Migrations land in their own deploy, never bundled with code that depends on them.
Record every rollback. The frequency and cause are leading indicators of release quality.
Wrap-up
Rollback is a deploy in reverse, and the strategy you pick is a trade-off between recovery time and operational complexity. Feature flags and traffic flips are nearly instant but require infrastructure investment. Pipeline redeploys are slow but free. The single most important rule is to keep code, config, and schema independently revertible. If you can do that, the rest is plumbing.
Related articles
- CI/CD Blue-Green vs Canary Deployments Explained
Compare blue-green and canary deployment strategies, including how they handle rollback, traffic shifting, and observability, with concrete Kubernetes and AWS examples.
- CI/CD Canary Deployments with Flagger Tutorial
Learn how to ship canary releases on Kubernetes using Flagger. Covers the control loop, metric analysis, traffic shifting, and how to roll back automatically when a release misbehaves.
- CI/CD CI/CD Deployment Strategies Overview
Compare rolling, blue/green, canary, shadow, and feature flag deployments. Learn when to pick each strategy and the trade-offs in risk and cost.
- CI/CD CI/CD Secrets Management Best Practices
Keep API keys, tokens, and database credentials safe in CI/CD with rotation, scoping, secret managers, and OIDC-based authentication.