Blue-Green vs Canary Deployments Explained
Compare blue-green and canary deployment strategies, including how they handle rollback, traffic shifting, and observability, with concrete Kubernetes and AWS examples.
What you'll learn
- ✓What blue-green and canary deployments actually do
- ✓How traffic shifting and rollback differ
- ✓Which one fits stateful vs stateless workloads
- ✓Implementation patterns on Kubernetes and AWS
- ✓Observability needed to make either safe
Prerequisites
- •General familiarity with CI/CD and load balancers
What and Why
Big-bang deploys ship new code to 100 percent of users at once. If the release is bad, 100 percent of users feel it. Blue-green and canary are two strategies that keep the bad version away from most users while you decide whether the new version is healthy.
They sound similar — both run old and new together — but they shift traffic very differently, and that difference drives everything else.
Mental Model
- Blue-green: run two complete production environments side by side. “Blue” serves all traffic. Deploy the new version to “green.” When green looks healthy, flip the load balancer to send 100 percent of traffic to green. Rollback = flip back.
- Canary: deploy the new version alongside the old, but route only a tiny slice of traffic (1 percent, then 5, then 25) to it. Watch error rates. Promote or roll back gradually.
Blue-green is binary: all-old or all-new. Canary is continuous: shift gradually.
Blue-Green:
t0: LB --> [blue 100%] [green 0%, warming]
t1: LB --> [blue 0%] [green 100%] (cutover)
Canary:
t0: LB --> [stable 100%] [canary 0%]
t1: LB --> [stable 99%] [canary 1%]
t2: LB --> [stable 90%] [canary 10%]
t3: LB --> [stable 0%] [canary 100%] Hands-on Example
Blue-green on Kubernetes using two Deployments and a Service selector swap:
apiVersion: apps/v1
kind: Deployment
metadata: { name: web-blue }
spec:
replicas: 4
selector: { matchLabels: { app: web, color: blue } }
template:
metadata: { labels: { app: web, color: blue } }
spec:
containers: [{ name: web, image: myorg/web:1.4 }]
---
apiVersion: apps/v1
kind: Deployment
metadata: { name: web-green }
spec:
replicas: 4
selector: { matchLabels: { app: web, color: green } }
template:
metadata: { labels: { app: web, color: green } }
spec:
containers: [{ name: web, image: myorg/web:1.5 }]
---
apiVersion: v1
kind: Service
metadata: { name: web }
spec:
selector: { app: web, color: blue } # flip to green to cut over
ports: [{ port: 80, targetPort: 80 }]
To cut over: kubectl patch svc web -p '{"spec":{"selector":{"app":"web","color":"green"}}}'. To roll back: patch it back to blue.
Canary on Kubernetes with Argo Rollouts:
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata: { name: web }
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 5
- pause: { duration: 5m }
- setWeight: 25
- pause: { duration: 10m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
analysis:
templates:
- templateName: success-rate
startingStep: 1
selector: { matchLabels: { app: web } }
template:
metadata: { labels: { app: web } }
spec:
containers: [{ name: web, image: myorg/web:1.5 }]
The analysis block queries Prometheus during each pause. If the success rate drops, Argo Rollouts halts and rolls back automatically.
On AWS, ECS supports both via deployment controllers, and CodeDeploy offers BlueGreen and Canary10Percent5Minutes traffic shifting for Lambda and ECS.
Common Pitfalls
Database migrations during cutover. Blue and green must speak to the same database. Forward-compatible migrations (add column, then deploy code that writes both, then drop old column in a later release) are required. Single-step migrations break one side during cutover.
Sticky sessions. Canary at 5 percent does nothing useful if your LB pins users to the version they first hit. Either accept a slight session reset or route by user-id hash for deterministic stickiness.
No automated rollback signal. Canary without metric-driven analysis is just “watch a dashboard and panic.” Define SLOs, hook them to your rollout controller, and let the system roll itself back.
Forgetting warm-up. Java services need JIT warm-up. Sending 100 percent traffic to a cold green environment causes a latency spike. Send a small prewarm wave first.
Cost surprise on blue-green. You temporarily double capacity. For large fleets, the dollar cost during cutover may force you toward canary instead.
Practical Tips
For stateless web apps, canary is usually the right default. Smaller blast radius, automated rollback, and you can extend the analysis window for slow-burning issues.
For batch jobs, workers, or anything without externally visible traffic, blue-green is simpler. Cut consumers over to the new queue or topic, drain the old, done.
Combine with feature flags so you can decouple “code deployed” from “feature enabled.” Even a canary at 100 percent can keep a risky feature off for everyone until you flip a flag for a small cohort.
Use synthetic traffic during the early canary steps. A small set of golden-path probes can detect regressions before real users see them.
Observability is the gate. You need:
- Error rate per version (HTTP 5xx, exception rate).
- p95/p99 latency per version.
- Business KPIs (orders per minute, sign-ups per minute).
Tag every metric with the version label, and your dashboards become rollout decisions.
Wrap-up
Blue-green is a binary cutover with fast rollback and high capacity cost. Canary is a gradual shift with automated metric-driven gating and smaller blast radius. Pick blue-green when migrations are simple, capacity is cheap, and cutovers can be all-or-nothing. Pick canary for user-facing services where slow-burning regressions are the bigger risk. Either way, invest in observability and forward-compatible schema changes before you invest in fancy rollout controllers — without those, both strategies just give you a slower way to ship the same outage.
Related articles
- CI/CD Canary Deployments with Flagger Tutorial
Learn how to ship canary releases on Kubernetes using Flagger. Covers the control loop, metric analysis, traffic shifting, and how to roll back automatically when a release misbehaves.
- CI/CD CI/CD Deployment Strategies Overview
Compare rolling, blue/green, canary, shadow, and feature flag deployments. Learn when to pick each strategy and the trade-offs in risk and cost.
- CI/CD CI/CD Rollback Strategies
An overview of rollback strategies in modern CI/CD: redeploy previous, blue-green flip, canary reverse, database-safe rollbacks, and the trade-offs between speed and safety.
- CI/CD CI/CD Secrets Management Best Practices
Keep API keys, tokens, and database credentials safe in CI/CD with rotation, scoping, secret managers, and OIDC-based authentication.