FinOps Right-Sizing Without Vibes: The 5-Number Recipe That Cuts $200k of Idle EC2

DEV Community

Muskan

May 11, 2026, 05:17 AM

The average mid-size production EC2 fleet runs at 12 to 23 percent utilization. The remaining 77 to 88 percent is idle compute that ran continuously, billed continuously, and produced nothing. On a 200-instance fleet of m5.xlarge equivalents, that idle slice is worth $150,000 to $250,000 a year. The numbers come from AWS Trusted Advisor and the Flexera 2025 State of the Cloud report and they have not moved much in five years. Right-sizing should fix this. Most teams treat it as judgement-driven: a senior engineer looks at peak CPU on a dashboard and picks an instance. Peak is the wrong signal. A workload with one weekly traffic spike to 80% CPU and a baseline of 12% gets sized to 80% by the senior engineer, over-provisioned by 5x for the 167 hours a week it is not spiking. Multiply across 200 instances and that is where the $200k lives. The fix is to stop using vibes. Five CloudWatch numbers per instance are enough to derive the right instance type from a lookup table. Same workload shape always picks the same instance. The procedure is deterministic, repeatable, and does not need a senior engineer's approval. This post is the recipe, the lookup, and the 7-day sentinel that keeps right-sizing from reversing the first time someone says "this feels slow." The pattern composes with VPA, HPA, and KEDA scaling. Right-sizing sets the per-instance baseline; the autoscalers handle the variation around it. The reverse-cost property is what makes right-sizing high-impact. The 200 instances that cost the most are not the ones with high utilization. They are the ones with low utilization. High-utilization instances are running flat-out and are by definition appropriately sized. Low-utilization instances are over-provisioned and continuously paying for headroom they do not use. Average CPU utilization Share of fleet Share of spend that is recoverable Above 60% (right-sized) 5-15% 70%) R5 / R6i (memory-optimized) Memory ratio matters more than CPU Given the five numbers, instance selection is deterministic. p99 CPU Burst duration p99 memory Recommended class 90% sustained any any C5 or specialized; investigate workload The "size to p99 / 0.85" rule keeps a 15% headroom above worst-case observed CPU. This is enough to absorb noise, leave room for periodic batch jobs, and survive a 1.5x traffic spike before HPA kicks in. Teams that size to p99 exactly run into pages on minor anomalies; teams that size to p99 / 0.5 are back to over-provisioning. The same lookup applies to Kubernetes node pools. The five numbers are computed at the node level instead of per-pod, the burst duration is the cluster-wide burst (since pods are bin-packed), and the recommended instance is the node-pool default. Right-sizing without rollback is a one-way street. The first time a senior engineer says "this feels slow," the change reverses, and the team learns to never right-size again. The sentinel is what keeps the data winning. The procedure: Deploy the smaller instance to a single canary instance in the affected ASG Watch p99 application latency on that instance for 7 days If latency degrades by more than 5% versus the control group, auto-revert to the original instance If latency stays within 5%, roll out to the rest of the ASG After full rollout, watch p99 latency for another 7 days; auto-revert if degraded The 5% threshold is the right number because it is wider than measurement noise (typically 1-2%) but narrower than user-perceptible degradation (typically 10%+). The 7-day window catches weekly traffic patterns, including the Sunday batch jobs that often cause the surprise burst that broke previous right-sizing attempts. The sentinel runs as an autonomous closed-loop. Detect: latency-degradation alarm. Decide: roll back the right-size. Act: terminate the smaller instance and let the ASG re-launch on the original type. Verify: latency returns to baseline. The auto-revert is what makes right-sizing safe to ship at fleet scale. Run the recipe across the fleet, sort by potential saving descending, and 80% of the dollars come from 20% of the instances. Those instances are easy to identify: they have p99 CPU below 30%, p99 memory below 50%, and current instance class at xlarge or larger. The CloudWatch Insights query is short. Group by InstanceId, compute avg(CPUUtilization) and max(CPUUtilization) and avg(mem_used_percent), filter where the average CPU is below 30 percent, sort ascending, limit 50. The top 50 instances from this query are the right-sizing candidates worth touching first. For a 200-instance fleet at average m5.xlarge cost ($140 per month), right-sizing them to m5.large or smaller saves $4,000 to $7,000 per month. Across the 50, that compounds to $200,000 to $350,000 per year. The remaining 150 instances are mostly already right-sized or close to it. Touching them produces less saving and more risk. The reverse-cost property tells you to stop after the long-tail 20% rather than try to optimize every instance. Right-sizing without vibes is the same five numbers, the same lookup table, and the same sentinel for every workload. The senior engineer's judgement is replaced by a procedure that runs to completion in an afternoon, ships measurable saving in 14 days, and reverses safely if any single canary degrades. The $200k a year exists because the procedure has been treated as a vibes-driven exercise; it does not have to be.