Feature Flags Best Practices for DevOps Teams
Feature flags decouple deploy from release. Learn flag types, rollout strategies, and how to keep your codebase from drowning in stale toggles.
What you'll learn
- ✓Why feature flags decouple deploy from release
- ✓The four common flag categories and their lifecycles
- ✓How progressive rollouts and kill switches work
- ✓How to avoid flag debt with cleanup policies
- ✓Production patterns for evaluation latency and consistency
Prerequisites
- •Familiar with shell and YAML
A feature flag is a runtime switch that gates code paths. Push code behind a flag that is off, and your deploy carries no behavior change. Flip the flag, and the new path turns on without a redeploy. This separation is the heart of modern continuous delivery: shipping binaries is boring, releasing features is a product decision.
What and Why
Without flags, every deploy is also a release. That couples engineering velocity to product risk. With flags, you can merge unfinished work, ship dark, test in production with internal users, and roll back instantly when something goes wrong. The cost is a small amount of branching code plus a control plane to manage state. The benefit is that “rollback” becomes a config change rather than a redeploy.
Four flag categories are worth distinguishing because they behave differently:
- Release flags hide in-progress features and live for days or weeks.
- Experiment flags split traffic for A/B tests and live for a defined experiment window.
- Ops flags act as kill switches for risky subsystems and may live forever.
- Permission flags gate features per customer or plan and live for the life of the product.
Treat them differently in code and in lifecycle reviews.
Mental Model
A flag has three parts: a definition (key, default), a targeting rule (who sees what), and an evaluation client (your app calls it). Your code never embeds a percentage or user list. It only asks: “for this context, what is the value?” The provider returns true or false (or a string variant), and your code branches.
user request
|
v
+----------------------+
| evaluation client |
| (caches rules) |
+----------+-----------+
|
v
targeting rule
/ \
in cohort? default
| |
v v
new path old path Hands-on Example
Below is a minimal flag definition expressed as YAML. Many providers consume something like this, and you can keep it under version control even when the live store is a SaaS system.
flags:
checkout-redesign:
description: "New checkout UI"
default: false
rules:
- segment: internal-users
value: true
- rollout:
percentage: 10
attribute: userId
value: true
In code, the call site stays small. The branching is local; the rule lives in config.
const ctx = { userId: req.user.id, country: req.user.country };
if (flags.isEnabled("checkout-redesign", ctx)) {
return renderNewCheckout(req);
}
return renderLegacyCheckout(req);
Roll out by raising the percentage in steps: 1, 5, 25, 50, 100. Watch error rates and latency between each step. If anything turns red, drop back to zero. That is your kill switch.
Common Pitfalls
The first pitfall is flag debt. Teams add flags faster than they remove them, and code grows nests of dead branches. Every flag needs an owner, a created-at, and an expected removal date. A weekly automated report should list flags older than 30 days that are at 100 percent or 0 percent rollout.
The second pitfall is inconsistent evaluation. If a user is bucketed differently on each request, the UI flickers. Use a stable bucketing attribute (user id, not session id) and hash it deterministically so the same user always gets the same variant.
The third pitfall is silent failure. If your flag service is down and your client falls back to the default, a flag at 100 percent might suddenly serve 0 percent. Cache the last known good rules locally and prefer stale-on-error over default-on-error for flags currently rolled out.
The fourth pitfall is using flags for config. Long-lived knobs like timeouts or feature toggles per tenant are config, not flags. Mixing them into the flag system creates a junk drawer with no lifecycle.
Production Tips
Evaluate flags as close to the request entry point as possible and pass the resolved values down the call stack. This makes the decision auditable per request and avoids late-binding surprises.
Emit a structured log line for every evaluation that affects behavior. Include flag key, variant, user id, and a request id. When something breaks, you want to answer “who saw what” in one query.
Wire flag changes into your change-management trail. A flag flip from 0 to 100 percent in production is a production change. It should show up in the same incident timeline as a deploy.
Use SDKs that support local evaluation. Round-tripping to a remote API on every request adds latency you do not want. Stream rule updates over a long-lived connection and evaluate in-process.
Finally, plan the removal during the PR that introduces the flag. Add a tracking issue with a due date. Without that habit, every release flag becomes an ops flag by accident.
Wrap-up
Feature flags let you ship continuously without releasing recklessly. Use them for in-progress features, experiments, kill switches, and entitlements, and treat each category with its own lifecycle. Keep the call sites small, the rules in version control, and the cleanup automated. Done well, flags turn deploys into non-events and releases into product decisions.
Related articles
- DevOps Chaos Engineering Introduction for DevOps Teams
An introduction to chaos engineering: hypothesis-driven failure injection that finds weaknesses before customers do.
- DevOps DevOps Incident Response Playbook
A practical playbook for running production incidents: roles, comms, mitigation order, and the postmortem that turns pain into improvement.
- DevOps DevOps SLO, SLI, and Error Budgets Explained
Service Level Indicators, Objectives, and error budgets demystified: how to pick the right metric, set a target, and use the budget as a decision tool.
- DevOps CI/CD Pipeline Design Fundamentals
How to design a CI/CD pipeline that stays fast, reliable, and reversible: stages, caching, parallelism, environments, and rollback strategies that scale with the team.