How long should I observe utilization before rightsizing?

Fourteen days is the floor for steady, always-on workloads with no weekly pattern. For most real systems you want 30 to 90 days so the window captures weekly cycles, end-of-month batch jobs, and reporting peaks. Anything with seasonal demand — retail, billing, tax-season spikes — needs a window that spans at least one full peak, or you will size to the quiet period and fall over when load returns.

Is CPU or memory the more important rightsizing signal?

Memory is usually the binding constraint and by far the more dangerous axis to cut. CPU pressure shows up as throttling and slower responses, which degrade gracefully; running out of memory triggers OOM kills that take processes down hard. Reclaim CPU aggressively when it's idle, but leave generous memory headroom above your observed peak.

Should I rightsize before or after buying Savings Plans / Reserved Instances?

Rightsize first, then commit. A commitment locks in your current shape at a discount, so if you buy before rightsizing you've just made your overprovisioned fleet cheaper — and harder to justify shrinking later. Let the baseline settle for 30 to 90 days after rightsizing, then commit to the smaller footprint you actually run.

Rightsizing

How to Rightsize Cloud Resources Without Breaking Applications

Rightsizing isn't “make it smaller and pray.” Here's how to cut 20–30% off compute and database spend with safety margins, the right observation window, and a rollback plan.

Finoud TeamMay 19, 202612 min read

Rightsizing is the highest-ROI, lowest-risk optimization most teams have sitting untouched: no architecture rewrite, no commitment, no vendor call — just matching what you provisioned to what you actually use. The catch is that “smaller equals cheaper” without a method is how you page yourself at 2am. Done with a method, it’s where you reclaim the first 20–30% of waste before you touch anything riskier.

The reason rightsizing stalls is rarely technical. It’s fear. Someone downsized a database once, latency spiked, a customer noticed, and now every instance is sized for the worst day of the year “to be safe.” That instinct is understandable and expensive — you end up paying a permanent insurance premium against an event that, sized correctly, wouldn’t have caused an outage in the first place. The fix isn’t courage; it’s process: the right data, honest thresholds, safety margins on the axes that actually bite, and a rollback that takes one click. This is that process, end to end.

What rightsizing is — and what it isn’t

Rightsizing is matching provisioned capacity to real, observed usage. It is notblindly downsizing everything by a notch, and it is not a race to the smallest instance that boots. The goal is to remove the capacity you provably never use while keeping enough headroom that normal peaks — and a reasonable amount of abnormal ones — never touch a ceiling. Sometimes rightsizing even means sizing upa starved instance that’s quietly throttling; “right” is the operative word, not “small.”

Framed that way, the fear evaporates. A performance regression comes from sizing to an average and getting surprised by a peak, or from cutting the wrong resource (memory) on the wrong workload (a JVM that OOM-kills instead of throttling). Both are method problems, not rightsizing problems. Get the observation window and the safety margins right and you’re removing fat, not muscle — and you can prove it with a graph before anyone asks.

The mindset

Rightsizing is a measurement exercise that ends in a config change, not a config change you justify after the fact. If you can’t point at a graph that says “this resource has never exceeded X over 30 days,” you don’t have a rightsizing candidate yet — you have a guess.

Step 1: Gather the right data

Every cloud gives you the telemetry; you just have to look at the right signals over a long enough window. On AWS that’s CloudWatch, on Azure it’s Azure Monitor, on Google Cloud it’s Cloud Monitoring. The metrics themselves are roughly universal — the gotchas are in the defaults, and the defaults are not on your side.

Metric	Why it matters	Watch out for
CPU utilization	The easiest capacity to reclaim — idle CPU is pure waste.	Burstable (T-family) instances earn CPU credits; low average CPU may still be hitting credit limits.
Memory utilization	Usually the binding constraint; the dangerous axis to cut.	Not collected by default on EC2 — you need the CloudWatch agent. No agent, no memory data, no safe call.
Disk usage & IOPS	Throughput-bound workloads are limited by IOPS, not vCPU.	A volume can be cool on capacity but pinned on IOPS or throughput.
Network throughput	Some instance sizes exist mainly for their network ceiling.	Downsizing can quietly drop your network and EBS bandwidth caps.

The memory point deserves emphasis because it trips up almost everyone: on EC2, CloudWatch reports CPU, disk, and network out of the box but not memory. The hypervisor can see CPU steal and network packets, but it cannot peer inside the guest OS to know how much RAM your processes are actually touching. Memory utilization requires the CloudWatch agent installed on the instance, publishing a custom metric. If you haven’t deployed it, you are flying blind on the single most dangerous axis to cut. Install it and let it collect for the full window before you make any memory-sensitive decision — the same caveat applies on Google Cloud, where the Ops Agent supplies guest memory, and on Azure, where you enable guest-level metrics through the diagnostics extension.

One more telemetry trap: resolution and retention. CloudWatch basic monitoring publishes at five-minute granularity, which smooths over exactly the short, sharp spikes you most need to see — a 90-second pin to 100% can vanish inside a five-minute average. If a workload is bursty, enable detailed one-minute monitoring for the period you’re studying, and pull the data before CloudWatch rolls high-resolution points up into coarser, longer-retained buckets.

Pick an observation window that captures your real shape

A single day lies. So does a single week if your workload has a monthly rhythm. Fourteen days is the absolute floor, and only for steady, always-on services with no weekly pattern. For most systems, observe 30 to 90 days so the window spans:

Weekly cycles — Monday morning load looks nothing like Sunday at 3am, and a Tuesday-to-Tuesday window can miss your weekend report generation entirely.
Month-end batch — billing runs, reconciliation jobs, payroll, and reporting that only fire on the 1st or the 31st and briefly dominate the box.
Seasonality — if you have end-of-quarter, holiday, or tax-season peaks, your window must contain at least one of them, or you'll size to the quiet season and fall over when demand returns.

And when you read those graphs, look at p95 and p99, not the average. Averages are where spikes go to hide. A box that averages 15% CPU can still hit 95% for five minutes every afternoon when a cron fires — and that five-minute peak is exactly what determines whether the smaller instance survives. The average tells you what you’re paying for; the tail tells you what you actually need.

Step 2: Set safe thresholds

With real data in hand, you need rules that turn graphs into decisions. These are starting points, not gospel — tune them to your risk tolerance and how much a given workload would hurt if it stumbled.

CPU candidate.If sustained CPU stays under roughly 20–40% across a 30-day window (judged on p95, not the mean), the resource is a downsize candidate. Consistently under 20% is an obvious one; the 40% end of the range is where you start being careful about spikes.
Memory is the dangerous axis.Treat it conservatively. An OOM kill is a hard failure; CPU throttling is a soft one. Never size memory to the average — size it above the observed peak with room to spare, and account for garbage-collected runtimes whose heap grows to fill whatever you give it.
Leave 20–30% headroom above the observed peak. Your target instance should clear your p99 with margin, not graze it. Headroom absorbs the spike you didn’t capture and the organic growth you haven’t seen yet, so you’re not back here next month.
Account for burst.Burstable families bank credits during idle periods and spend them during spikes. A low average can mask a workload that periodically needs full cores — check the CPU credit balance before you assume it’s idle, or you’ll move it to a smaller burstable type and watch it run out of credits at the worst possible moment.

It helps to make this concrete. Say a web tier runs on an m6i.2xlarge(8 vCPU, 32 GiB). Over 30 days, p95 CPU is 18%, p99 is 24%, and observed peak memory — from the agent — is 9 GiB. Apply 30% headroom above peak and you need roughly 12 GiB of RAM and a couple of busy cores. The honest target is an m6i.large(2 vCPU, 8 GiB)… except 8 GiB doesn’t clear your 12 GiB headroom target. So you either step to the xlarge (4 vCPU, 16 GiB) or switch to a memory-leaning family. That tension is the whole job:

Option	vCPU / RAM	Clears 12 GiB target?	Verdict
m6i.2xlarge (current)	8 / 32 GiB	Yes, by a mile	Oversized — paying for ~3x the RAM in use
m6i.xlarge	4 / 16 GiB	Yes	Safe rightsize: clears headroom on both axes
m6i.large	2 / 8 GiB	No (only 8 GiB)	Too aggressive — memory headroom gone
r6i.large	2 / 16 GiB	Yes	Good fit if CPU truly stays low and RAM is the constraint

Asymmetric risk

The cost of cutting too much memory (an outage) is far higher than the cost of cutting too little (a few dollars of unused RAM). Size CPU aggressively and memory cautiously — the downside is not symmetric, so your margins shouldn’t be either. When the math is ambiguous, round toward the safer instance.

Step 3: Rightsize by resource type

The principle is constant; the tooling and the traps differ per resource. Each cloud also ships its own analyzer, so a multi-cloud estate means learning three of them — see multi-cloud cost management across AWS, GCP, and Azure for how the tooling and terminology diverge.

EC2

Start with AWS Compute Optimizer. It analyzes CloudWatch history and recommends instance types with projected utilization for each option, classifying instances as under-provisioned, over-provisioned, or optimized — though remember it can only see memory if the agent is feeding it data. Beyond a same-family size step, the bigger wins are often in moving families: migrating to current-gen Graviton (the m7g/c7g/r7glines) typically buys meaningfully better price-performance than the Intel and AMD equivalents, as long as your workload runs on arm64. Validate the architecture switch in non-prod first; most interpreted and JIT runtimes (Go, Java, Node, Python, .NET) are fine, but precompiled native dependencies and a few container base images occasionally aren’t.

RDS

Databases need a wider lens than CPU. Look at CPU and database connections andfreeable memory together — a database can sit at 10% CPU while pinned on connections or starved for buffer-pool memory, and downsizing the instance class shrinks the connection ceiling and the RAM available for caching in one move. For spiky or unpredictable load, Aurora Serverless v2is worth evaluating: instead of sizing a fixed instance for the peak, it scales capacity (in ACUs) with demand, which can be dramatically cheaper for workloads busy a few hours a day and near-idle the rest. For steady, predictable databases a rightsized provisioned instance is usually still the cheaper floor — serverless wins on variability, not on flat load. And always change the reader and writer separately; they rarely have the same utilization profile.

Kubernetes

Container rightsizing is really request and limit tuning. Requestsdrive scheduling and therefore the size of your node pool — over-requesting is how you end up paying for half-empty nodes, because the scheduler reserves what you asked for whether the pod uses it or not. Limits protect the node from a single pod eating everything (the noisy-neighbor problem). Set requests to observed usage and limits high enough to absorb legitimate spikes. Run the Vertical Pod Autoscaler in recommendation mode(Off update mode) to get data-backed targets without it mutating live pods, and judge against p95/p99 per pod — not the average, which will lull you into under-requesting and getting pods OOM-killed or CPU-throttled under load. For memory specifically, prefer setting requests and limits equal so a pod gets the Guaranteed QoS class and isn’t the first thing evicted under node pressure.

Storage

Storage rightsizing is the cleanup nobody schedules. Three reliable wins:

gp2 → gp3.gp3 is cheaper per GB and decouples IOPS and throughput from volume size, so you stop over-provisioning capacity just to buy performance. For most general-purpose volumes it’s a near-free ~20% storage saving with a baseline of 3,000 IOPS included.
Unattached EBS volumes.Volumes that outlived their instances keep billing forever at full rate. Snapshot anything you might conceivably need, then delete the live volume — the snapshot costs a fraction of the provisioned volume.
Idle and over-provisioned volumes.Volumes sized for growth that never came, or attached to instances that do nothing. These are pure carrying cost, and they’re easy to spot once you look at read/write throughput alongside provisioned size.

An implementation playbook that won’t page you at 2am

The analysis tells you what to change. The playbook is howto change it without becoming the incident. The whole point is that each step is small, observable, and reversible — so a bad call costs you a resize, not an outage.

Change non-prod first.Dev and staging are your free rehearsal. If a workload breaks at the smaller size, you find out where it’s cheap to find out — and you learn the real failure mode before it matters.
One resource family at a time.Do the EC2 web tier, observe, then move to RDS, then storage. Batching everything at once means you can’t tell which change caused the regression, and you’ll end up rolling back things that were fine.
Change one variable. Resize the instance or switch architecture oradjust the volume — not all three in one deploy. Attribution is impossible when three things moved together.
Bake and observe for 1–2 weeks. Give the change long enough to meet a full cycle, including the weekly and month-end peaks the old size handled without complaint.
Keep a rollback ready.The beauty of rightsizing is that the rollback is one instance-type change — stop, resize back up, start. No data migration, no rebuild. Write down the previous type before you start so the rollback is a 30-second operation, not an archaeology project.
Measure savings AND health. Track the dollars saved alongsidep95 latency and error rate. A change that saves $400/month and adds 80ms of p95 latency is not a win — it’s a trade you should make on purpose, not by accident.

Common pitfalls

Rightsizing before killing idle.Don’t carefully tune the instance type of a service nobody uses. Delete idle and zombie resources first, thenrightsize what remains — optimizing the size of a box that should be off is wasted effort.
Ignoring seasonal peaks.An observation window that misses your busy season produces an instance that’s perfect in August and on fire in December. Span at least one peak, or explicitly size for the peak you know is coming.
Letting averages hide spikes.A 15% average with a daily 95% cron burst is not a quiet box. Size on p95/p99 or the spike will find you — usually during a demo.
Not re-checking after deploys.Rightsizing is a point-in-time fit, not a permanent state. A new feature, a library upgrade, or a traffic shift changes the shape — revisit candidates after major releases and on a recurring cadence.
Rightsizing then immediately committing wrong.Don’t buy a 3-year commitment the week you finish resizing; let the new baseline prove itself for 30–90 days, then commit. The right sequence — rightsize, settle, then commit to the lower baseline — is exactly what makes Savings Plans and Reserved Instances pay off instead of locking in waste at a discount.

How Finoud helps

Finoud runs rightsizing detection continuously across AWS, GCP, and Azure, reading the same utilization signals you would — CPU, memory (where the agent is reporting it), IOPS, and network — over a window long enough to catch the peaks, and surfaces candidates ranked by both savings and risk so you fix the safe, high-value ones first. It never takes scary auto-actions: every recommendation is yours to apply, with the target size and the projected headroom spelled out, and it tracks realized savings against what it recommended so you can see whether a change actually delivered. Join the waitlist for early access.

Frequently asked questions

How long should I observe utilization before rightsizing?: Fourteen days is the floor for steady, always-on workloads with no weekly pattern. For most real systems you want 30 to 90 days so the window captures weekly cycles, end-of-month batch jobs, and reporting peaks. Anything with seasonal demand — retail, billing, tax-season spikes — needs a window that spans at least one full peak, or you will size to the quiet period and fall over when load returns.
Is CPU or memory the more important rightsizing signal?: Memory is usually the binding constraint and by far the more dangerous axis to cut. CPU pressure shows up as throttling and slower responses, which degrade gracefully; running out of memory triggers OOM kills that take processes down hard. Reclaim CPU aggressively when it's idle, but leave generous memory headroom above your observed peak.
Should I rightsize before or after buying Savings Plans / Reserved Instances?: Rightsize first, then commit. A commitment locks in your current shape at a discount, so if you buy before rightsizing you've just made your overprovisioned fleet cheaper — and harder to justify shrinking later. Let the baseline settle for 30 to 90 days after rightsizing, then commit to the smaller footprint you actually run.
How do I rightsize Kubernetes workloads?: Tune requests to observed usage, because requests drive both scheduling and the cost of your node pool, then set limits to protect against noisy neighbors. Run the Vertical Pod Autoscaler in recommendation mode to get data-backed targets before you change anything. Size against p95/p99 rather than averages so short spikes don't trigger throttling or evictions.