DynamoDB Capacity Modes: Provisioned vs. On-Demand — Making the Right Call
Choosing the wrong DynamoDB capacity mode before you understand your traffic patterns is one of the fastest ways to either burn money on idle capacity or trigger throttling that silently degrades your application. This post gives you a decision framework grounded in how each mode actually works under the hood.
TL;DR
| Dimension | On-Demand | Provisioned + Auto Scaling |
|---|---|---|
| Traffic Pattern Known? | No — unpredictable or new | Yes — stable or predictable spikes |
| Throttling Risk | Near-zero (instant scale) | Low if scaled correctly; high if misconfigured |
| Cost Model | Pay per request (rWCU/rRCU) | Pay per provisioned unit/hour |
| Cost at High Sustained Load | Expensive (~6–7x per unit) | Cheap — optimal for steady workloads |
| Operational Overhead | Zero | Requires Auto Scaling policy tuning |
| Switch Frequency Limit | Once per 24 hours per table | |
| ✅ Core Takeaway | Start with On-Demand. Once you have 2–4 weeks of CloudWatch metrics, switch to Provisioned + Auto Scaling to cut costs by 50–80% at sustained load. | |
How Each Mode Works Internally
On-Demand: The Serverless Contract
When you enable On-Demand, DynamoDB manages capacity allocation entirely. You are billed per Write Request Unit (WRU) and Read Request Unit (RRU) consumed. There are no pre-allocated slots — DynamoDB scales the underlying partition infrastructure automatically in response to traffic, up to 2x the previous peak traffic within minutes, and can handle any new table's traffic up to the service default limits immediately.
The critical internal constraint: On-Demand can scale up to double the previous peak in a short window. If your traffic jumps 10x instantaneously from a cold start, DynamoDB will still absorb it — but the cost per operation is roughly 6–7x more expensive than an equivalent provisioned unit.
Provisioned: The Reserved Capacity Contract
Provisioned mode requires you to declare Read Capacity Units (RCU) and Write Capacity Units (WCU) per second. DynamoDB pre-allocates partition-level throughput based on these numbers. Requests exceeding this limit are throttled (HTTP 400 ProvisionedThroughputExceededException) unless burst capacity absorbs the spike.
Burst capacity is a pool of unused capacity units saved over the last 300 seconds. It is a best-effort buffer — not a guarantee. Do not architect around it.
Auto Scaling (via Application Auto Scaling) monitors consumed capacity via CloudWatch and adjusts provisioned units within a defined min/max range. The scale-out reaction time is typically 1–3 minutes — meaning a sudden spike can still cause throttling before scaling kicks in.
Decision Flow: Which Mode to Choose?
The Real Cost Difference
Using us-east-1 pricing as a reference point:
| Operation | On-Demand (per million) | Provisioned (per WCU/RCU-hour) |
|---|---|---|
| Write | $1.25 / million WRUs | $0.00065 / WCU-hour (~$0.47/mo per WCU) |
| Read | $0.25 / million RRUs | $0.00013 / RCU-hour (~$0.09/mo per RCU) |
Practical example: A table sustaining 100 WCU continuously costs ~$47/month provisioned vs. ~$270/month on-demand (100 WCU × 3600s × 24h × 30d × $1.25/1M). The break-even point is roughly 30–40% sustained utilization — above that, Provisioned wins on cost.
Analogy — Hotel vs. Airbnb: On-Demand is like booking an Airbnb nightly — you pay only for what you use, but the per-night rate is high. Provisioned is like signing a monthly hotel lease — you pay whether you sleep there or not, but the per-night equivalent is dramatically cheaper if you're there most nights. The mistake engineers make is signing the lease before knowing how often they'll actually stay.
Implementation: Starting with On-Demand
🔽 [Click to expand] Terraform: On-Demand Table (Starting Point)
resource "aws_dynamodb_table" "my_table" {
name = "my-app-table"
billing_mode = "PAY_PER_REQUEST" # On-Demand
hash_key = "PK"
range_key = "SK"
attribute {
name = "PK"
type = "S"
}
attribute {
name = "SK"
type = "S"
}
tags = {
Environment = "production"
CostCenter = "backend-team"
}
}
Implementation: Switching to Provisioned + Auto Scaling
After observing your peak consumed capacity in CloudWatch, configure Provisioned mode with Auto Scaling. Set your minimum to your average baseline, maximum to 150% of your observed peak, and target utilization at 70% — this gives Auto Scaling headroom to react before you hit the ceiling.
🔽 [Click to expand] Terraform: Provisioned Table with Auto Scaling
resource "aws_dynamodb_table" "my_table" {
name = "my-app-table"
billing_mode = "PROVISIONED"
read_capacity = 10 # baseline from CloudWatch observations
write_capacity = 5
hash_key = "PK"
range_key = "SK"
attribute {
name = "PK"
type = "S"
}
attribute {
name = "SK"
type = "S"
}
}
# --- Write Auto Scaling ---
resource "aws_appautoscaling_target" "write_target" {
max_capacity = 50 # 150% of observed peak
min_capacity = 5
resource_id = "table/${aws_dynamodb_table.my_table.name}"
scalable_dimension = "dynamodb:table:WriteCapacityUnits"
service_namespace = "dynamodb"
}
resource "aws_appautoscaling_policy" "write_policy" {
name = "DynamoDBWriteAutoScaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.write_target.resource_id
scalable_dimension = aws_appautoscaling_target.write_target.scalable_dimension
service_namespace = aws_appautoscaling_target.write_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "DynamoDBWriteCapacityUtilization"
}
target_value = 70.0 # scale out when 70% utilized
}
}
# --- Read Auto Scaling (mirror pattern) ---
resource "aws_appautoscaling_target" "read_target" {
max_capacity = 100
min_capacity = 10
resource_id = "table/${aws_dynamodb_table.my_table.name}"
scalable_dimension = "dynamodb:table:ReadCapacityUnits"
service_namespace = "dynamodb"
}
resource "aws_appautoscaling_policy" "read_policy" {
name = "DynamoDBReadAutoScaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.read_target.resource_id
scalable_dimension = aws_appautoscaling_target.read_target.scalable_dimension
service_namespace = aws_appautoscaling_target.read_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "DynamoDBReadCapacityUtilization"
}
target_value = 70.0
}
}
Key CloudWatch Metrics to Monitor
ConsumedWriteCapacityUnits/ConsumedReadCapacityUnits: Your actual usage baseline — use p99 over 2–4 weeks to set provisioned minimums.ThrottledRequests: Non-zero values in Provisioned mode mean your min capacity or Auto Scaling reaction time is insufficient.SuccessfulRequestLatency: Latency spikes often correlate with throttling and retry storms.SystemErrors: Internal DynamoDB errors — distinct from throttling, requires AWS support if persistent.
IAM: Minimum Required Permissions
For Auto Scaling to function, the application-autoscaling.amazonaws.com service principal needs permission to describe and update your table's capacity:
🔽 [Click to expand] IAM Policy: DynamoDB Auto Scaling Service Role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"dynamodb:DescribeTable",
"dynamodb:UpdateTable",
"cloudwatch:PutMetricAlarm",
"cloudwatch:DescribeAlarms",
"cloudwatch:DeleteAlarms"
],
"Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/my-app-table"
}
]
}
Wrap-up & Next Steps
Start every new DynamoDB table on On-Demand to eliminate throttling risk while you gather real traffic data, then migrate to Provisioned + Auto Scaling once your baseline is clear — this is the lowest-risk, cost-optimized path.
📌 Next Steps:
- Deploy your table with
billing_mode = "PAY_PER_REQUEST"today. - Set a CloudWatch alarm on
ConsumedWriteCapacityUnitsto track your p99 peak over 2–4 weeks. - Use the AWS Cost Explorer DynamoDB view to validate your break-even point before switching.
- Reference: AWS Docs — Read/Write Capacity Mode
Glossary
- WCU (Write Capacity Unit): One WCU represents one write per second for an item up to 1 KB; the fundamental billing unit in Provisioned mode.
- RCU (Read Capacity Unit): One RCU represents one strongly consistent read per second (or two eventually consistent reads) for items up to 4 KB.
- Throttling (
ProvisionedThroughputExceededException): DynamoDB's rejection of requests that exceed the provisioned or burst capacity for a partition. - Burst Capacity: A best-effort reserve of unused capacity units accumulated over the last 300 seconds, used to absorb short traffic spikes in Provisioned mode.
- Application Auto Scaling: The AWS service that monitors CloudWatch metrics and automatically adjusts DynamoDB provisioned capacity within defined min/max bounds.
Comments
Post a Comment