DynamoDB Capacity Modes: Provisioned vs. On-Demand — Which One Saves Money and Avoids Throttling?

Choosing the wrong DynamoDB capacity mode is one of the fastest ways to either over-pay for idle capacity or trigger throttling that cascades into application errors.

TL;DR: DynamoDB Capacity Modes at a Glance

Solution	Mechanism	Best For	Complexity
On-Demand	AWS scales capacity automatically per request	Unknown, spiky, or new workloads	Low
Provisioned (with Auto Scaling)	You define RCU/WCU targets; Auto Scaling adjusts within bounds	Predictable, steady-state workloads	Medium
Provisioned (manual)	Fixed RCU/WCU; no automatic adjustment	Tightly controlled, cost-sensitive workloads with known patterns	High operational overhead

Why DynamoDB Capacity Mode Selection Matters More Than You Think

The decision isn't just about cost — it directly controls whether your table throttles requests. DynamoDB enforces capacity limits at the partition level, and a misconfigured mode can produce ProvisionedThroughputExceededException errors even when your aggregate RCU/WCU budget appears sufficient. Understanding the mechanics of each mode before traffic arrives is the only way to make a defensible choice.

How DynamoDB Capacity Modes Work: The Core Architecture

Every DynamoDB table is backed by partitions. Each partition has its own throughput slice. The capacity mode you choose determines how that throughput is allocated and enforced.

In Provisioned mode, you specify Read Capacity Units (RCUs) and Write Capacity Units (WCUs) explicitly. DynamoDB distributes these uniformly across partitions. If a single partition receives a disproportionate share of traffic — a hot partition — it can exhaust its slice and throttle requests even if the table's aggregate capacity is underutilized.

In On-Demand mode, DynamoDB manages capacity automatically. You pay per request rather than for reserved capacity. The table can serve any traffic level up to its previous peak doubled, and AWS scales the underlying capacity without operator intervention.

graph LR Req[Request] --> Router[Partition Router] Router --> P1[Partition] P1 --> ModeCheck{Billing Mode?} ModeCheck -->|Provisioned| SliceCheck{Slice Available?} SliceCheck -->|Yes| Serve[Serve Request] SliceCheck -->|No| Throttle[ThrottledRequest Error] ModeCheck -->|On-Demand| ODServe[Serve Request] ODServe --> BillPerReq[Bill Per Request] Serve --> BillProvisioned[Bill Reserved Capacity] Throttle --> ASCheck[Auto Scaling Alarm?] ASCheck -->|Yes - after delay| UpdateTable[UpdateTable API] ASCheck -->|No| DropRequest[Request Dropped]

Request arrives at the DynamoDB endpoint and is routed to the responsible partition.
Provisioned path: the partition checks its allocated RCU/WCU slice. If the slice is exhausted, the request is throttled immediately.
On-Demand path: the partition has no fixed slice. AWS tracks consumed capacity and bills per request; throttling only occurs at extreme burst rates beyond the table's previous peak.
Auto Scaling (Provisioned only): a CloudWatch alarm triggers an Application Auto Scaling policy, which calls the UpdateTable API to adjust provisioned capacity. This adjustment is not instantaneous — it takes minutes.

Think of Provisioned mode as reserving a specific number of highway lanes. You pay for those lanes whether cars use them or not, and traffic beyond the lane count backs up. On-Demand mode is a toll road that opens new lanes automatically — you pay per car, but you never reserve empty lanes.

Decision Guide: Which DynamoDB Capacity Mode Fits Your Workload?

Apply this filter first: if you cannot predict traffic within a 2x range over a 24-hour window, start with On-Demand. The cost premium is the price of operational simplicity while you gather real traffic data.

graph LR Start([New DynamoDB Table]) --> Q1{Traffic predictable?} Q1 -->|No| Q2{New app / no baseline?} Q2 -->|Yes| OnDemand[On-Demand Mode] Q2 -->|No| Q3{Spiky or event-driven?} Q3 -->|Yes| OnDemand Q3 -->|No| Q4{High sustained volume?} Q4 -->|Yes| ProvAS[Provisioned + Auto Scaling] Q4 -->|No| OnDemand Q1 -->|Yes| Q5{Cost-sensitive at scale?} Q5 -->|Yes| ProvAS Q5 -->|No| OnDemand

Traffic predictable? If yes, Provisioned with Auto Scaling is almost always cheaper at scale.
New application? No baseline exists. On-Demand eliminates the guesswork and avoids early throttling that damages user trust.
Spiky or event-driven? On-Demand handles sudden bursts without pre-warming. Provisioned Auto Scaling reacts in minutes, not seconds.
Cost-sensitive at high volume? Provisioned capacity is priced lower per RCU/WCU than On-Demand at sustained, predictable load. The crossover point depends on your specific usage — check the AWS DynamoDB pricing page for current rates.

Solution A: On-Demand Capacity Mode (Recommended Starting Point)

Production Gotcha: A team launches with On-Demand, traffic grows steadily for 30 days, then they switch to Provisioned. They provision based on average traffic — not peak. The first traffic spike after the switch triggers a wave of ProvisionedThroughputExceededException errors. The misdiagnosis is "Auto Scaling isn't working." The actual cause: Auto Scaling reacts to CloudWatch metrics with a delay, and the spike exhausted capacity before the scaling action completed.

On-Demand requires no capacity planning. Create the table and DynamoDB handles the rest.

— Why this step: verifying the table's billing mode via CLI confirms the actual runtime configuration, not just what was set at creation — console display and actual mode can diverge after a failed mode switch.

aws dynamodb describe-table \
  --table-name MyTable \
  --query 'Table.BillingModeSummary'

To create a new table in On-Demand mode:

aws dynamodb create-table \
  --table-name MyTable \
  --attribute-definitions AttributeName=PK,AttributeType=S \
  --key-schema AttributeName=PK,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region us-east-1

To switch an existing Provisioned table to On-Demand:

aws dynamodb update-table \
  --table-name MyTable \
  --billing-mode PAY_PER_REQUEST \
  --region us-east-1

Switching modes is allowed once per 24-hour period. Plan mode changes accordingly — you cannot switch back immediately if the change produces unexpected cost behavior.

Solution B: Provisioned Capacity with Auto Scaling

Once you have 2–4 weeks of CloudWatch metrics showing a stable traffic pattern, Provisioned mode with Auto Scaling typically reduces cost compared to On-Demand at the same sustained load level.

In practice, teams often set Auto Scaling target utilization too high — at 90% or above — believing it saves money. What actually happens is that the scaling alarm triggers only after the table is already near saturation, leaving no headroom for the scaling action to complete before throttling begins. A target utilization in the 70% range gives the scaler time to act before the table exhausts capacity.

Auto Scaling for DynamoDB uses Application Auto Scaling. The setup requires three components: a scalable target registration, a scaling policy, and the IAM permissions for Application Auto Scaling to call dynamodb:UpdateTable.

— Why this step: registering the scalable target is a prerequisite that the DynamoDB console sometimes handles silently — if you configure Auto Scaling via CLI without this step, the scaling policy has no target to act on and silently does nothing.

aws application-autoscaling register-scalable-target \
  --service-namespace dynamodb \
  --resource-id "table/MyTable" \
  --scalable-dimension "dynamodb:table:WriteCapacityUnits" \
  --min-capacity 5 \
  --max-capacity 500 \
  --region us-east-1

Then attach a target tracking scaling policy:

aws application-autoscaling put-scaling-policy \
  --service-namespace dynamodb \
  --resource-id "table/MyTable" \
  --scalable-dimension "dynamodb:table:WriteCapacityUnits" \
  --policy-name "MyTableWriteScalingPolicy" \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "DynamoDBWriteCapacityUtilization"
    }
  }' \
  --region us-east-1

Repeat the registration and policy steps for ReadCapacityUnits using dynamodb:table:ReadCapacityUnits and DynamoDBReadCapacityUtilization.

The IAM role used by Application Auto Scaling must include permission to call dynamodb:UpdateTable and cloudwatch:PutMetricAlarm. AWS provides a service-linked role (AWSServiceRoleForApplicationAutoScaling_DynamoDBTable) that covers these permissions and is created automatically when you register the first scalable target.

graph LR Traffic[Write Traffic] --> DDB[DynamoDB Table] DDB --> CW[CloudWatch Metric] CW --> Alarm{Utilization > 70%?} Alarm -->|Yes| AppAS[App Auto Scaling] AppAS --> UpdateTable[dynamodb:UpdateTable] UpdateTable --> DDB Alarm -->|No| ScaleIn{Utilization low?} ScaleIn -->|Yes - after cooldown| ScaleDown[Reduce WCU] ScaleDown --> DDB ScaleIn -->|No| Monitor[Continue Monitoring]

CloudWatch tracks consumed WCU as a percentage of provisioned WCU.
When utilization exceeds the target (70% in this example), a CloudWatch alarm fires.
Application Auto Scaling receives the alarm and calls dynamodb:UpdateTable to increase provisioned WCUs.
DynamoDB applies the new capacity. The scaling action takes effect within minutes — not seconds.
When utilization drops, a scale-in alarm fires and Auto Scaling reduces provisioned capacity after a cooldown period.

Auto Scaling reacts to sustained load, not instantaneous spikes. That gap is the most common source of throttling in Provisioned mode.

Monitoring DynamoDB Capacity Mode Behavior in Production

Regardless of which mode you choose, two CloudWatch metrics are mandatory to watch: ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits, and ThrottledRequests. The third metric is the one that pages you at 2 AM.

— Why this step: ThrottledRequests is a table-level metric, but throttling often originates at a specific partition — checking SystemErrors and enabling DynamoDB Contributor Insights reveals which partition keys are generating the hot traffic that aggregate metrics obscure.

aws dynamodb describe-table \
  --table-name MyTable \
  --query 'Table.TableStatus' \
  --region us-east-1

aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ThrottledRequests \
  --dimensions Name=TableName,Value=MyTable \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-01T01:00:00Z \
  --period 60 \
  --statistics Sum \
  --region us-east-1

Enable Contributor Insights to identify hot partition keys:

aws dynamodb update-contributor-insights \
  --table-name MyTable \
  --contributor-insights-action ENABLE \
  --region us-east-1

Contributor Insights has an additional cost — verify current pricing before enabling it on high-traffic tables.

IAM Permissions Required for Capacity Mode Management

Managing capacity modes and Auto Scaling requires permissions across two service namespaces. The following policy covers the minimum required actions for an operator managing DynamoDB capacity configuration.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DynamoDBCapacityManagement",
      "Effect": "Allow",
      "Action": [
        "dynamodb:UpdateTable",
        "dynamodb:DescribeTable",
        "dynamodb:UpdateContributorInsights",
        "dynamodb:DescribeContributorInsights"
      ],
      "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/MyTable"
    },
    {
      "Sid": "AutoScalingRegistration",
      "Effect": "Allow",
      "Action": [
        "application-autoscaling:RegisterScalableTarget",
        "application-autoscaling:PutScalingPolicy",
        "application-autoscaling:DescribeScalableTargets",
        "application-autoscaling:DescribeScalingPolicies"
      ],
      "Resource": "*"
    },
    {
      "Sid": "CloudWatchReadForCapacity",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricStatistics",
        "cloudwatch:DescribeAlarms"
      ],
      "Resource": "*"
    }
  ]
}

Note: application-autoscaling and cloudwatch Read/List actions do not support resource-level restrictions — "Resource": "*" is required for those actions per the AWS Service Authorization Reference.

Glossary

RCU (Read Capacity Unit): One strongly consistent read per second for an item up to 4 KB, or two eventually consistent reads per second for the same item size.
WCU (Write Capacity Unit): One write per second for an item up to 1 KB.
On-Demand Mode: A DynamoDB billing mode (PAY_PER_REQUEST) where capacity scales automatically and you are charged per request rather than for reserved capacity.
Provisioned Mode: A DynamoDB billing mode where you specify fixed RCU and WCU values. Requests exceeding provisioned capacity are throttled unless Auto Scaling adjusts capacity first.
Hot Partition: A DynamoDB partition receiving a disproportionately high share of read or write traffic, causing partition-level throttling even when aggregate table capacity is sufficient.
Application Auto Scaling: An AWS service that adjusts provisioned DynamoDB capacity in response to CloudWatch utilization alarms, using target tracking scaling policies.
Contributor Insights: A DynamoDB feature that uses CloudWatch Contributor Insights to identify the most frequently accessed and throttled partition keys and sort keys.
ProvisionedThroughputExceededException: The error returned by DynamoDB when a request exceeds the provisioned or burst capacity of a table or partition.

Wrapping Up: DynamoDB Capacity Modes Decision in One Rule

If you don't yet have traffic data, On-Demand is the correct default for DynamoDB capacity — it eliminates throttling risk while you gather the metrics needed to make a cost-optimized Provisioned configuration. Once you have a stable traffic baseline, Provisioned mode with Auto Scaling at a 70% target utilization threshold gives you cost efficiency without sacrificing headroom. Switch modes deliberately, monitor ThrottledRequests continuously, and use Contributor Insights to catch hot partitions before they become incidents.

Search This Blog

SW BBANG