Monitoring EC2 Memory Usage: Why CloudWatch Needs the Agent for RAM and Disk Metrics

Monitoring EC2 memory usage is a common production gap — CloudWatch omits RAM and disk metrics by default because the hypervisor cannot access guest OS internals without an agent.

TL;DR: Why CloudWatch Doesn't Show EC2 Memory Usage by Default

CloudWatch does not include memory or disk utilization in its default EC2 metric set. The root cause is architectural: AWS hypervisors can observe only what is visible from outside the virtual machine boundary — CPU cycles, network packets, and disk I/O operations at the block device level. RAM allocation and filesystem utilization live entirely inside the guest OS. The fix is to install the CloudWatch Agent on the instance, which reads OS-level counters and publishes them as custom metrics under the CWAgent namespace.

MetricDefault CloudWatchWith CloudWatch Agent
CPU Utilization✅ Available✅ Available (enhanced)
Network In/Out✅ Available✅ Available
EBS Read/Write Ops✅ Available✅ Available
Memory (RAM) Usage❌ Not available✅ Published as custom metric
Disk Space Used/Free❌ Not available✅ Published as custom metric
Swap Utilization❌ Not available✅ Published as custom metric
Per-process metrics❌ Not available✅ With procstat plugin

Why the Hypervisor Boundary Blocks EC2 Memory Usage Visibility

AWS EC2 instances run on a virtualization layer (the Nitro hypervisor on modern instance types). The hypervisor allocates a fixed block of physical RAM to the guest and enforces isolation between tenants. From the hypervisor's perspective, that entire RAM block is "in use" the moment the instance starts — it cannot distinguish between memory your application is actively using, memory the OS kernel has cached, and memory that is genuinely free.

graph TD HV["AWS Nitro Hypervisor"] subgraph Visible_from_outside ["Observable from Hypervisor"] CPU["CPU Utilization"] NET["Network In / Out"] DISKIO["EBS Read/Write Ops"] end subgraph Guest_OS ["Inside Guest OS — Not Observable Externally"] MEM["RAM Usage
/proc/meminfo"] DISK["Filesystem Usage
df / statvfs"] SWAP["Swap Utilization"] end CWA["CloudWatch Agent
(runs inside guest)"] CW["Amazon CloudWatch
CWAgent Namespace"] HV --> CPU HV --> NET HV --> DISKIO CPU -->|"Default EC2 metrics"| CW NET -->|"Default EC2 metrics"| CW DISKIO -->|"Default EC2 metrics"| CW MEM --> CWA DISK --> CWA SWAP --> CWA CWA -->|"PutMetricData
Custom metrics"| CW
  1. Hypervisor boundary: AWS infrastructure can observe CPU cycles consumed, bytes transferred on the virtual NIC, and block-level I/O on attached EBS volumes — all observable from outside the VM.
  2. Guest OS interior: RAM allocation, filesystem mount points, swap usage, and per-process memory maps exist only inside the OS kernel's address space. No external observer can read them without cooperation from inside the instance.
  3. CloudWatch Agent bridge: The agent runs as a process inside the guest OS, reads /proc/meminfo (Linux) or Windows Performance Counters, and pushes the data to the CloudWatch API as custom metrics. This crosses the boundary intentionally.
Think of it like a landlord who can read the electricity meter on the outside of your apartment wall but cannot see which appliances are running inside. The CloudWatch Agent is the tenant who agrees to report appliance usage back to the landlord.

Installing and Configuring the CloudWatch Agent for EC2 Memory Usage

There are two valid installation paths. Choose based on your fleet management posture.

flowchart TD START(["Need to monitor EC2 memory"]) Q1{"Is SSM Agent installed
and instance registered?"} Q2{"Managing a fleet
or single instance?"} A["Approach A: SSM-Managed
(Recommended)"] B["Approach B: Manual SSH
Installation"] DONE(["CloudWatch Agent running
CWAgent metrics flowing"]) START --> Q1 Q1 -->|"Yes"| Q2 Q1 -->|"No"| B Q2 -->|"Fleet / Auto Scaling"| A Q2 -->|"Single instance / debug"| B A --> DONE B --> DONE

Approach A (Recommended): SSM-Managed Installation

Use AWS Systems Manager (SSM) to install and configure the agent without SSH access. This approach scales across fleets and keeps the agent configuration version-controlled in SSM Parameter Store.

Step 1: Attach the required IAM instance profile.
— Why this step: the CloudWatch Agent must call cloudwatch:PutMetricData and ssm:GetParameter on your behalf; without the correct role, the agent starts but silently drops all metric writes.

Attach the AWS managed policy CloudWatchAgentServerPolicy to the EC2 instance role. If you also want the agent to pull its configuration from SSM Parameter Store, the role also needs AmazonSSMManagedInstanceCore.

🔽 Click to expand — IAM trust and policy attachment
# Attach CloudWatchAgentServerPolicy to your existing EC2 instance role
aws iam attach-role-policy \
  --role-name YourEC2InstanceRole \
  --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

# Attach SSM managed instance core (required for SSM-based config delivery)
aws iam attach-role-policy \
  --role-name YourEC2InstanceRole \
  --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Step 2: Install the CloudWatch Agent via SSM Run Command.
— Why this step: using SSM Run Command avoids the need for SSH access and creates an auditable installation record in SSM command history.

aws ssm send-command \
  --document-name "AWS-ConfigureAWSPackage" \
  --targets "Key=instanceids,Values=i-0123456789abcdef0" \
  --parameters '{"action":["Install"],"name":["AmazonCloudWatchAgent"]}' \
  --region us-east-1

Step 3: Create the agent configuration and store it in SSM Parameter Store.
— Why this step: storing the config in Parameter Store means every new instance in your Auto Scaling group can self-configure at launch without baking the config into the AMI.

The configuration below collects memory and disk metrics at 60-second intervals. The metrics_collection_interval controls how frequently the agent samples the OS counters before publishing to CloudWatch.

🔽 Click to expand — CloudWatch Agent configuration JSON
{
  "metrics": {
    "namespace": "CWAgent",
    "metrics_collected": {
      "mem": {
        "measurement": [
          "mem_used_percent",
          "mem_available",
          "mem_total"
        ],
        "metrics_collection_interval": 60
      },
      "disk": {
        "measurement": [
          "used_percent",
          "free",
          "total"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "/"
        ]
      },
      "swap": {
        "measurement": [
          "swap_used_percent"
        ],
        "metrics_collection_interval": 60
      }
    }
  }
}
# Store the config in SSM Parameter Store
aws ssm put-parameter \
  --name "/cloudwatch-agent/config/linux-memory" \
  --type "String" \
  --value file://cw-agent-config.json \
  --region us-east-1

Step 4: Start the agent and point it at the SSM parameter.
— Why this step: installing the package does not start the agent; the start command is a separate operation that also binds the agent to a specific configuration source.

aws ssm send-command \
  --document-name "AmazonCloudWatch-ManageAgent" \
  --targets "Key=instanceids,Values=i-0123456789abcdef0" \
  --parameters '{
    "action":["configure"],
    "mode":["ec2"],
    "optionalConfigurationSource":["ssm"],
    "optionalConfigurationLocation":["/cloudwatch-agent/config/linux-memory"],
    "optionalRestart":["yes"]
  }' \
  --region us-east-1

Approach B: Manual Installation via SSH (Single Instance / Debugging)

Use this path when SSM is not yet configured or you are debugging agent behavior directly on a single instance.

Step 1: Download and install the agent package.
— Why this step: the agent is not pre-installed on standard Amazon Linux 2 or Amazon Linux 2023 AMIs; the package must be explicitly installed before any configuration is applied.

# Amazon Linux 2 / Amazon Linux 2023
sudo yum install -y amazon-cloudwatch-agent

# Ubuntu / Debian
wget https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
sudo dpkg -i amazon-cloudwatch-agent.deb

Step 2: Run the configuration wizard.
— Why this step: the wizard generates a validated JSON configuration file and avoids common syntax errors that cause the agent to fail silently at startup.

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

Step 3: Start the agent using the generated configuration.
— Why this step: the agent must be explicitly started after configuration; it does not auto-start after installation.

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config \
  -m ec2 \
  -s \
  -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json

Verifying EC2 Memory Usage Metrics Are Flowing

Step 1: Confirm the agent process is running on the instance.
— Why this step: a misconfigured agent exits silently; checking the process state is faster than waiting for metrics to appear in the console.

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status

Expected output includes "status": "running". If the status is stopped, check the agent log at /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log for IAM permission errors or configuration parse failures.

Step 2: Query CloudWatch to confirm metrics are being published.
— Why this step: the agent can be running without successfully writing to CloudWatch if the instance role is missing cloudwatch:PutMetricData — the agent log will show permission errors, but the CloudWatch console simply shows no data.

aws cloudwatch list-metrics \
  --namespace CWAgent \
  --metric-name mem_used_percent \
  --region us-east-1

If the command returns an empty Metrics array after 2-3 minutes, the agent is not successfully publishing. Verify the instance role policy and check the agent log.

The agent writes to the CWAgent namespace by default — not the AWS/EC2 namespace. Alarms and dashboards must target CWAgent explicitly.

Creating a CloudWatch Alarm on EC2 Memory Usage

Step 1: Identify the exact dimension values for your instance.
— Why this step: the mem_used_percent metric is dimensioned by host (the instance hostname or private DNS name), not by InstanceId — this surprises engineers who expect EC2-style dimensions and causes alarm creation to fail silently.

aws cloudwatch list-metrics \
  --namespace CWAgent \
  --metric-name mem_used_percent \
  --region us-east-1 \
  --query 'Metrics[*].Dimensions'

Step 2: Create the alarm using the correct dimension.
— Why this step: an alarm referencing a non-existent dimension combination stays in INSUFFICIENT_DATA state indefinitely and never triggers, giving false confidence that monitoring is active.

aws cloudwatch put-metric-alarm \
  --alarm-name "EC2-HighMemoryUsage" \
  --namespace CWAgent \
  --metric-name mem_used_percent \
  --dimensions Name=host,Value=ip-10-0-1-25.us-east-1.compute.internal \
  --statistic Average \
  --period 300 \
  --evaluation-periods 2 \
  --threshold 85 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:ops-alerts \
  --region us-east-1
sequenceDiagram participant OS as Guest OS participant CWA as CloudWatch Agent participant CW as CloudWatch API participant ALM as CloudWatch Alarm participant SNS as SNS Topic loop Every 60 seconds CWA->>OS: Read /proc/meminfo OS-->>CWA: mem_total, mem_available CWA->>CW: PutMetricData (CWAgent namespace) end CW->>ALM: Evaluate mem_used_percent ALM-->>ALM: 2 consecutive periods >= 85% ALM->>SNS: Publish ALARM notification SNS-->>SNS: Fan out to subscribers
  1. Agent samples OS: Every 60 seconds (configurable), the agent reads /proc/meminfo and calls cloudwatch:PutMetricData.
  2. CWAgent namespace: Metrics land in CWAgent, not AWS/EC2. Alarms must target this namespace explicitly.
  3. Alarm evaluation: CloudWatch evaluates the alarm period and threshold. After the configured number of breaching periods, the alarm transitions to ALARM state.
  4. SNS notification: The alarm action fires an SNS publish to the configured topic, which can fan out to email, PagerDuty, Lambda, or other subscribers.

Production Gotcha: The INSUFFICIENT_DATA Trap

Symptom: Your memory alarm stays in INSUFFICIENT_DATA state even though the agent appears to be running. Common misdiagnosis: Engineers assume the agent is not publishing and reinstall it, which changes nothing. Actual cause: The alarm's host dimension value was set to the instance ID (i-0123456789abcdef0) instead of the hostname the agent reports. The metric exists in CloudWatch, but the alarm is watching a dimension combination that has zero data points.

Dimension mismatch is the most common reason memory alarms stay permanently in INSUFFICIENT_DATA — always verify with list-metrics before creating the alarm.

Cost Considerations for Custom EC2 Memory Usage Metrics

Custom metrics published by the CloudWatch Agent are billed differently from default EC2 metrics. Default EC2 metrics (CPU, network, disk I/O) are included at no additional charge. Each custom metric — including mem_used_percent, disk/used_percent, and swap_used_percent — is a separately billed custom metric in CloudWatch.

In practice, teams monitoring large fleets sometimes configure the agent to publish at a longer interval (300 seconds instead of 60 seconds) to reduce the volume of PutMetricData API calls and associated costs. The tradeoff is reduced alarm responsiveness. Pricing and limits vary — always check the official AWS CloudWatch pricing page for current rates.

Wrap-Up: Closing the EC2 Memory Usage Monitoring Gap

The absence of memory and disk metrics in default CloudWatch is not a product limitation — it is a direct consequence of the hypervisor isolation model that makes EC2 multi-tenancy secure. The CloudWatch Agent is the documented, supported mechanism for crossing that boundary. Once installed and configured, it publishes OS-level counters to the CWAgent namespace, where they can be alarmed, dashboarded, and queried like any other CloudWatch metric.

The two most common failure modes after installation are IAM permission gaps (agent runs but cannot call PutMetricData) and dimension mismatch in alarms (alarm watches a host value that does not match what the agent reports). Both are diagnosable in under two minutes with the CLI commands shown above.

For fleet-scale deployments, consider baking the agent installation and SSM parameter reference into your AMI or EC2 launch template user data so every new instance self-configures at boot without manual intervention.

Next steps: Review the CloudWatch Agent official documentation and the agent configuration file reference for the full list of collectable metrics including per-process metrics via the procstat plugin.

Glossary

TermDefinition
CloudWatch AgentA software process installed inside an EC2 instance (or on-premises server) that collects OS-level metrics and logs and publishes them to Amazon CloudWatch.
CWAgent NamespaceThe CloudWatch metrics namespace where the CloudWatch Agent publishes custom metrics. Distinct from the default AWS/EC2 namespace.
Custom MetricA CloudWatch metric published by user code or the CloudWatch Agent, as opposed to metrics automatically emitted by AWS services. Custom metrics are billed separately.
Nitro HypervisorThe lightweight hypervisor used by modern EC2 instance types that enforces isolation between the host hardware and guest OS, limiting what AWS infrastructure can observe inside the VM.
SSM Parameter StoreAn AWS Systems Manager capability for storing configuration data as key-value pairs. Used here to centrally store and distribute the CloudWatch Agent configuration to EC2 instances.

Related Posts

Comments

Popular posts from this blog

EC2 No Internet Access in Custom VPC: Fix Internet Gateway and Route Table

EC2 SSH Connection Timeout: Which Security Group Rules to Check

Difference Between IAM User and IAM Role: Which One Should Your EC2 Use?