AWS Cost Optimization Strategies That Actually Work
A pragmatic playbook for cutting AWS spend without hurting reliability: right-sizing, savings plans, storage tiering, and architectural moves.
What you'll learn
- ✓Where AWS spend usually hides
- ✓Right-sizing without breaking SLOs
- ✓Savings plans vs reserved instances
- ✓Storage class economics
- ✓Architectural levers for big wins
Prerequisites
- •Familiar with terminals and YAML
What and Why
AWS bills are growth machines. Left unchecked, they compound on every successful launch, every new microservice, every test environment forgotten on Friday. Cost optimization is not a one-off cleanup but a habit that pays for itself many times over.
The reason to care is simple: cloud spend is a top-three cost line for most software companies. A 20 percent reduction often beats a quarter of engineering productivity gains, and it usually comes from changes that also improve reliability.
Mental Model
Think of AWS spend in three layers, each with different tools.
Layer 1 - Commitments
Savings Plans, Reserved Instances, Spot
(commercial moves, no code change)
Layer 2 - Resource hygiene
Right-size, schedule on/off, tier storage, delete waste
(operational, low risk)
Layer 3 - Architecture
Serverless, caching, compression, async, multi-tenant
(project work, biggest leverage) The order matters. Architectural fixes have the largest payoff but the longest cycle time. Start with commitments and hygiene to free up budget while the architectural bets pay off.
Hands-on Example
Suppose Cost Explorer shows EC2 and RDS dominate spend, with S3 a steady third. A pragmatic plan:
- Right-size with data. Pull two weeks of CloudWatch CPU and memory and pick instance sizes at the 95th percentile, not the peak.
aws cloudwatch get-metric-statistics \
--namespace AWS/EC2 --metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-0abc \
--start-time 2026-06-13T00:00:00Z --end-time 2026-06-27T00:00:00Z \
--period 3600 --statistics p95
-
Buy a compute Savings Plan for the floor of your fleet. Aim for 70 to 80 percent coverage; never 100 percent or you pay for capacity you cannot use.
-
Schedule non-prod off. A simple Lambda triggered by EventBridge:
NightlyStopRule:
Type: AWS::Events::Rule
Properties:
ScheduleExpression: "cron(0 22 * * ? *)"
Targets: [{ Arn: !GetAtt StopFn.Arn, Id: stop }]
- Tier S3. Move infrequently accessed objects to Intelligent-Tiering automatically:
{
"Rules": [{
"ID": "tier-old",
"Status": "Enabled",
"Filter": {},
"Transitions": [{ "Days": 0, "StorageClass": "INTELLIGENT_TIERING" }]
}]
}
- Cache hot reads. Put CloudFront in front of S3 and an ElastiCache in front of expensive RDS queries. Egress and database CPU drop together.
Common Pitfalls
- Buying RIs for changing workloads. Reserved instances lock you to a family and size. Savings Plans are more flexible and almost always the right default now.
- Right-sizing on peak. Pick p95 plus headroom, not the worst minute of the busiest day. Use auto scaling for the rest.
- Forgotten resources. Unattached EBS volumes, idle NAT gateways, orphaned load balancers, abandoned snapshots - all bill 24/7. Run a weekly audit.
- Cross-AZ chatter. Inter-AZ traffic is billed at 1 cent per GB each way. A chatty service mesh across AZs can cost more than the compute it runs on.
- Logging everything to CloudWatch. Ingest is expensive. Ship verbose logs to S3 or a third party and keep only the structured signal in CloudWatch.
Production Tips
- Tag everything with owner, environment, and cost-center and enforce it via Service Control Policies. Untaggable spend cannot be optimized.
- Send daily anomaly alerts via AWS Cost Anomaly Detection into Slack. Catching a runaway in hours rather than weeks pays for itself once a year.
- Negotiate Enterprise Discount Programs once you cross a few hundred thousand annual spend. The committed discount stacks with Savings Plans.
- Use Graviton wherever your runtime supports it. A move from m5 to m7g often nets 20 percent savings with no application changes.
- Treat the AWS bill as a product: a small finops engineer, a weekly review, a quarterly target. Anything less and waste creeps back.
Wrap-up
Cost optimization is not glamorous, but the leverage is real. Cover the floor with Savings Plans, kill obvious waste with hygiene, and invest in architectural changes that compound. Pair these moves with tagging discipline and anomaly alerts, and the bill becomes something you can reason about and reduce on purpose, year after year.
Related articles
- LLMs LLM Cost Tracking in Production
A practical guide to attributing, monitoring, and controlling LLM spend per user, per feature, and per request without slowing down delivery.
- AWS AWS API Gateway vs ALB: Choosing the Right Entry Point
Compare API Gateway and Application Load Balancer for fronting AWS workloads, including features, pricing, latency, and when to use each in production.
- AWS AWS CloudFront CDN Tutorial: Caching at the Edge
Learn how AWS CloudFront accelerates content delivery, what cache behaviors look like, and how to wire it up to an S3 origin with sensible defaults.
- AWS AWS CloudWatch Metrics and Alarms: Practical Observability
Build a meaningful CloudWatch setup with custom metrics, composite alarms, and dashboards that catch real incidents without paging on noise.