How to Configure Auto Scaling for an EC2 Group Based on CPU Utilization

You deploy an application, traffic spikes unexpectedly, and CPUUtilization climbs past 70% while your fixed-size fleet starts dropping requests. The fix — wiring a CloudWatch alarm to an Auto Scaling policy — sounds straightforward, but the ordering of resources, the choice of scaling policy type, and the monitoring interval you pick all interact in ways that bite engineers in production. This post walks through configuring CPU-based EC2 Auto Scaling correctly, from the launch template through the alarm, with the CLI commands you actually need.

TL;DR — CPU-Based Auto Scaling at a Glance

Component	Role	Key Decision
Auto Scaling Group (ASG)	Manages the fleet of EC2 instances	Min / max / desired capacity
Scaling Policy	Defines how and when to scale	Target Tracking vs. Step Scaling
CloudWatch Alarm	Triggers the policy on threshold breach	Evaluation periods × period length
Monitoring Level	Controls metric granularity	Basic (5 min) vs. Detailed (1 min)
Cooldown / Warmup	Prevents thrashing after a scale event	Instance warmup period on the policy

How EC2 Auto Scaling Works with CloudWatch

Before wiring the alarm, you need a clear model of how the signal travels from your instances to a scaling action. The path has more latency than most engineers expect the first time they set it up.

CloudWatch aggregates CPUUtilization metrics from instances based on their monitoring level: every 5 minutes with basic (standard) monitoring and every 1 minute with detailed monitoring. The alarm evaluates those data points over a window you define — for example, two consecutive 5-minute periods means the alarm needs 10 minutes of sustained breach before it fires. That lag is intentional; it prevents a momentary CPU spike from launching instances you don't need. But it also means that with basic monitoring, your fleet won't respond to a sudden spike for up to 10 minutes. Enable detailed monitoring on the launch template if your application is latency-sensitive.

Once the alarm transitions to ALARM state, it invokes the scaling policy. The ASG then launches new instances, which take time to pass health checks and enter the InService state. During that warmup window, Auto Scaling treats the new instances as not yet contributing to the group's aggregate CPU — this matters for Target Tracking policies, which continuously recalculate the required capacity.

graph LR A["EC2 Instances"] -->|"CPUUtilization metric
1min or 5min"| B["CloudWatch"] B --> C{"Alarm Evaluation
N consecutive periods"} C -->|"Threshold breached"| D["Alarm: ALARM state"] C -->|"Below threshold"| E["Alarm: OK state"] D --> F["Scaling Policy Triggered"] F --> G["ASG launches instances"] G --> H["Pending → InService"] H --> I["Warmup period active"] I -->|"Warmup expires"| J["Instance contributes
to aggregate CPU"] J --> B

Instances emit CPUUtilization — every 5 min (basic) or 1 min (detailed) to CloudWatch.
CloudWatch Alarm evaluates — checks whether the metric exceeds the threshold across N consecutive periods.
Alarm state transitions to ALARM — triggers the associated scaling policy on the ASG.
ASG launches instances — new instances go through Pending → health check → InService.
Warmup period — new instances are excluded from aggregate metric calculations until warmup expires.
Cooldown / stabilization — ASG suppresses further scale-out actions during the cooldown window.

Choosing the Right Scaling Policy Type

Two policy types handle CPU-based scaling. Pick the wrong one and you'll either over-provision or react too slowly.

Target Tracking (Recommended for CPU)

You declare a target CPU percentage — say, 60% — and Auto Scaling continuously adjusts the group size to maintain it. AWS creates and manages the CloudWatch alarms automatically. You don't write the alarm yourself. This is the right choice for most CPU-based workloads because it handles gradual load increases smoothly and scales in conservatively to avoid flapping.

Step Scaling (When You Need Explicit Control)

You define alarm thresholds and explicit capacity adjustments per breach magnitude. Useful when you need asymmetric responses — for example, add 2 instances at 70% CPU but add 5 at 90%. You write the CloudWatch alarm yourself and attach it to the policy. More control, more configuration surface area to get wrong.

Target Tracking is like a thermostat — you set the temperature and the system figures out how hard to run the furnace. Step Scaling is like manually adjusting the furnace output based on how cold it gets. Both work; the thermostat is easier to live with.

Step 1 — Create a Launch Template with Detailed Monitoring

The launch template defines what gets launched. Enabling detailed monitoring here ensures CloudWatch receives 1-minute CPUUtilization data points instead of 5-minute ones, cutting your alarm response latency from up to 10 minutes down to as little as 2 minutes (two 1-minute evaluation periods).

aws ec2 create-launch-template \
  --launch-template-name my-app-lt \
  --version-description "v1" \
  --launch-template-data '{
    "ImageId": "ami-0abcdef1234567890",
    "InstanceType": "t3.medium",
    "Monitoring": { "Enabled": true },
    "SecurityGroupIds": ["sg-0123456789abcdef0"],
    "IamInstanceProfile": { "Arn": "arn:aws:iam::123456789012:instance-profile/MyAppProfile" }
  }'

Replace ImageId, InstanceType, SecurityGroupIds, and the IAM instance profile ARN with your actual values. "Monitoring": { "Enabled": true } activates detailed monitoring — without this, you're on 5-minute intervals and your alarms will feel sluggish under sudden load.

Step 2 — Create the Auto Scaling Group

The ASG ties together the launch template, the VPC subnets, and the capacity boundaries. Setting min, max, and desired correctly is critical — the ASG will never scale below min or above max regardless of what the alarm says.

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name my-app-asg \
  --launch-template LaunchTemplateName=my-app-lt,Version='$Latest' \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 2 \
  --vpc-zone-identifier "subnet-0abc123,subnet-0def456" \
  --health-check-type EC2 \
  --health-check-grace-period 300

The --health-check-grace-period of 300 seconds gives new instances time to finish bootstrapping before Auto Scaling checks their health. If your application takes 3 minutes to start, set this to at least 180 — instances that fail health checks before they're ready get terminated and replaced, creating a replacement loop that's confusing to debug.

Step 3 — Attach a Target Tracking Policy (Recommended Path)

This single command creates the policy and the CloudWatch alarms. You don't need to create alarms separately for Target Tracking.

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name my-app-asg \
  --policy-name cpu-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "TargetValue": 60.0,
    "DisableScaleIn": false
  }'

The target is 60%, not 70%. If you set the target at 70%, the group is already under stress before scaling kicks in — by the time new instances are InService, you may have already degraded. Setting the target 10 percentage points below your pain threshold gives the scaling action time to complete before you hit the wall. This is the non-obvious interaction between target value and instance warmup latency that catches engineers who set the target too high.

The command returns a PolicyARN. Save it — you'll need it if you later want to attach notifications or modify the policy.

Step 4 — Step Scaling Alternative (When You Need Explicit Thresholds)

If Target Tracking doesn't give you the control you need — for example, you want to add more instances at 90% CPU than at 70% — use Step Scaling. This requires creating the CloudWatch alarm manually and linking it to the policy.

4a — Create the Step Scaling Policy

aws autoscaling put-scaling-policy \
  --auto-scaling-group-name my-app-asg \
  --policy-name cpu-step-scale-out \
  --policy-type StepScaling \
  --adjustment-type ChangeInCapacity \
  --metric-aggregation-type Average \
  --estimated-instance-warmup 120 \
  --step-adjustments '[
    {
      "MetricIntervalLowerBound": 0.0,
      "MetricIntervalUpperBound": 20.0,
      "ScalingAdjustment": 2
    },
    {
      "MetricIntervalLowerBound": 20.0,
      "ScalingAdjustment": 4
    }
  ]'

This adds 2 instances when CPU is 70–90% above the alarm threshold, and 4 instances when it's more than 20 percentage points above — i.e., above 90% if your alarm fires at 70%. Note the PolicyARN returned by this command.

4b — Create the CloudWatch Alarm and Link It

With detailed monitoring enabled (1-minute periods), two evaluation periods means the alarm fires after 2 consecutive minutes above the threshold. With basic monitoring (5-minute periods), the same two-period configuration means 10 minutes of sustained breach.

aws cloudwatch put-metric-alarm \
  --alarm-name cpu-high-70 \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 60 \
  --evaluation-periods 2 \
  --threshold 70 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --dimensions Name=AutoScalingGroupName,Value=my-app-asg \
  --alarm-actions arn:aws:autoscaling:us-east-1:123456789012:scalingPolicy:POLICY-ID:autoScalingGroupName/my-app-asg:policyName/cpu-step-scale-out

Replace the --alarm-actions ARN with the actual PolicyARN returned in step 4a. The --period 60 works correctly only if detailed monitoring is enabled on the instances — if you're on basic monitoring, change this to 300 to match the 5-minute aggregation interval. Mismatching the period to the monitoring level produces sparse or missing data points, which causes the alarm to stay in INSUFFICIENT_DATA state.

graph TD A["CPUUtilization metric"] --> B{"Monitoring level?"} B -->|"Basic (standard)"| C["5-minute periods"] B -->|"Detailed"| D["1-minute periods"] C --> E["2 periods = 10 min to alarm"] D --> F["2 periods = 2 min to alarm"] E --> G["CloudWatch Alarm fires"] F --> G G --> H["Step Scaling Policy"] H --> I{"Breach magnitude?"} I -->|"0-20% above threshold"| J["Add 2 instances"] I -->|"20%+ above threshold"| K["Add 4 instances"] J --> L["ASG adjusts capacity"] K --> L L --> M["Instance warmup period"]

Basic monitoring path — 5-minute periods; two evaluation periods = 10 minutes to alarm.
Detailed monitoring path — 1-minute periods; two evaluation periods = 2 minutes to alarm.
Alarm fires — invokes the Step Scaling policy ARN via alarm-actions.
Step Scaling evaluates breach magnitude — selects the matching step adjustment.
ASG adjusts capacity — warmup period suppresses further scale-out until instances stabilize.

Step 5 — Verify the Configuration End-to-End

Configuration errors here are silent until load hits. Run these checks before you consider the setup complete.

Confirm the ASG and policy are attached

aws autoscaling describe-policies \
  --auto-scaling-group-name my-app-asg

Look for your policy in the output and confirm PolicyType matches what you created. For Target Tracking, you should also see auto-generated alarms in the output under Alarms.

Check alarm state

aws cloudwatch describe-alarms \
  --alarm-names cpu-high-70

The alarm should be in OK state if current CPU is below threshold, or INSUFFICIENT_DATA if it hasn't received enough data points yet. An alarm stuck in INSUFFICIENT_DATA after several minutes usually means the period doesn't match the monitoring interval — the most common misconfiguration in this setup.

Confirm detailed monitoring is active on running instances

aws ec2 describe-instances \
  --filters Name=tag:aws:autoscaling:groupName,Values=my-app-asg \
  --query 'Reservations[*].Instances[*].[InstanceId,Monitoring.State]' \
  --output table

The Monitoring.State column should show enabled for detailed monitoring. If it shows disabled, new instances launched from the launch template will have basic monitoring — check that "Monitoring": { "Enabled": true } is present in the launch template version being used by the ASG.

Simulate a scale-out event

aws cloudwatch set-alarm-state \
  --alarm-name cpu-high-70 \
  --state-value ALARM \
  --state-reason "Manual test"

This forces the alarm into ALARM state and triggers the scaling policy immediately — useful for validating the wiring without generating real CPU load. Watch the ASG activity history to confirm instances launch.

aws autoscaling describe-scaling-activities \
  --auto-scaling-group-name my-app-asg

Experience Signal — The INSUFFICIENT_DATA Trap

A team set up step scaling with a 1-minute period alarm, deployed, and watched the alarm sit in INSUFFICIENT_DATA for 20 minutes under real load. The instinct was to check IAM permissions or alarm configuration. The actual cause: the launch template had "Monitoring": { "Enabled": false } — basic monitoring was active, so CloudWatch was receiving data points every 5 minutes. The alarm expected data every 60 seconds and saw gaps, which CloudWatch interprets as missing data, keeping the alarm in INSUFFICIENT_DATA rather than transitioning to ALARM.

The fix was a one-line change to the launch template, an instance refresh to replace running instances, and a period change on the alarm from 60 to 300 to match the monitoring level they actually wanted. The lesson: the alarm period and the monitoring interval must match. A mismatch doesn't produce an error — it produces silence.

IAM Permissions Required

The principal creating these resources needs the following permissions at minimum. Scope resource ARNs to your specific ASG and alarms in production.

🔽 Click to expand — IAM policy for Auto Scaling setup

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:CreateAutoScalingGroup",
        "autoscaling:PutScalingPolicy",
        "autoscaling:DescribePolicies",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:DescribeAutoScalingGroups"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:PutMetricAlarm",
        "cloudwatch:DescribeAlarms",
        "cloudwatch:SetAlarmState"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateLaunchTemplate",
        "ec2:DescribeInstances",
        "ec2:RunInstances",
        "ec2:TerminateInstances"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::123456789012:role/MyAppInstanceRole"
    }
  ]
}

Read and Describe actions on Auto Scaling and CloudWatch require "Resource": "*" — these actions don't support resource-level restrictions in IAM. Always verify against the AWS Service Authorization Reference before narrowing permissions.

Wrap-Up and Next Steps for CPU-Based Auto Scaling

The core setup is: launch template with detailed monitoring → ASG with appropriate capacity bounds → Target Tracking policy at a target value below your pain threshold. For most CPU-based workloads, that's all you need. Step Scaling is worth the extra configuration only when you need differentiated responses at different CPU magnitudes.

A few things worth doing after the initial setup:

Enable instance refresh — so launch template changes propagate to running instances without manual intervention.
Set up scale-in protection on instances handling long-running jobs to prevent Auto Scaling from terminating them mid-work.
Review the default termination policy — by default, Auto Scaling terminates the instance with the oldest launch configuration first. Verify this matches your intent.
Check your load balancer's deregistration delay — if it's set higher than your instance shutdown time, requests will fail during scale-in.

Official references: Target Tracking Scaling Policies, Step and Simple Scaling Policies, CloudWatch Alarms.

Glossary

Term	Definition
Auto Scaling Group (ASG)	A logical grouping of EC2 instances managed collectively by Auto Scaling, with defined min/max/desired capacity bounds.
Target Tracking Policy	A scaling policy type where you specify a target metric value and Auto Scaling manages alarms and capacity adjustments automatically.
Step Scaling Policy	A scaling policy type where you define explicit capacity adjustments for specific alarm breach magnitudes.
Detailed Monitoring	EC2 monitoring mode that publishes CloudWatch metrics at 1-minute intervals, compared to 5-minute intervals for basic monitoring.
Instance Warmup	A period after a new instance launches during which Auto Scaling excludes it from aggregate metric calculations to prevent premature scale-out decisions.

Search This Blog

SW BBANG