DynamoDB Partition Keys: Why Your Choice Determines Performance (and How to Fix Hot Partitions)
You designed a DynamoDB table, deployed it to production, and now certain queries are throttling while others fly — the culprit is almost always a poorly chosen partition key creating a 'hot partition'. Understanding how DynamoDB physically distributes data is the single most important design decision you'll make with this service.
TL;DR
| Concept | What It Means | Impact |
|---|---|---|
| Partition Key | The primary attribute DynamoDB uses to route data to a physical storage node | Determines data distribution |
| Hot Partition | One partition receiving disproportionately high read/write traffic | Throttling, latency spikes |
| High Cardinality Key | A key with many distinct values (e.g., userId, orderId) | Even distribution, best performance |
| Low Cardinality Key | A key with few distinct values (e.g., status, country) | Uneven distribution, hot partitions |
| Write Sharding | Appending a random/calculated suffix to spread writes across partitions | Eliminates hot partitions artificially |
How DynamoDB Partitioning Works Internally
DynamoDB is a fully managed key-value and document store that automatically partitions data across multiple storage nodes. When you write an item, DynamoDB applies an internal hash function to the partition key value to determine which physical partition stores that item. Every read and write operation is routed to exactly one partition based on this hash.
Each partition has a defined throughput ceiling. When provisioned capacity mode is used, the total table throughput is divided across partitions. If one partition receives the majority of traffic, it exhausts its share of capacity while other partitions sit idle — this is a hot partition.
- Hash Function: DynamoDB hashes the partition key value to produce a numeric token.
- Token Space: The full hash space is divided into ranges, each assigned to a physical partition.
- Routing: Every read/write is routed to the single partition whose range contains the item's hash token.
- Hot Partition Problem: If most items share the same partition key (e.g.,
status = "active"), they all hash to the same partition, overwhelming it.
The Hot Partition Problem: A Concrete Example
Imagine a table storing IoT sensor events with deviceType as the partition key. If 90% of your devices are type "temperature_sensor", then 90% of all writes hammer a single partition. The remaining partitions are nearly idle.
- Bad Key (deviceType): Low cardinality — most writes go to one partition, causing throttling.
- Good Key (deviceId): High cardinality — writes spread evenly across all partitions.
- The physical storage capacity and throughput per partition is fixed; overloading one causes
ProvisionedThroughputExceededExceptionor consumed capacity spikes in on-demand mode.
Analogy: Think of DynamoDB partitions as checkout lanes at a supermarket. If every customer is told to use lane 3 (because their item type maps there), lane 3 has a massive queue while lanes 1, 2, 4, and 5 are empty. A good partition key is like a load balancer that sends customers to the shortest available lane — distributing work evenly across all resources.
Choosing a Good Partition Key: The Rules
The golden rule is high cardinality + uniform access patterns. Here is a decision framework:
Rule 1: Maximize Cardinality
Choose an attribute with as many distinct values as possible. Natural candidates:
userId— unique per userorderId— unique per transactiondeviceId— unique per devicesessionId— unique per session
Avoid: status, country, boolean flags, date strings (e.g., "2024-01-15" — all items on that day share one partition).
Rule 2: Ensure Uniform Access
High cardinality alone is not enough. If your application always reads the same 10 user IDs out of 10 million, those 10 partitions are hot. Your access pattern must be as distributed as your key space.
Rule 3: Composite Keys for Range Queries
Use a composite primary key (partition key + sort key) when you need range queries within a logical group. For example: partition key = userId, sort key = timestamp. This lets you query all events for a user in a time range while keeping writes distributed across users.
Advanced Fix: Write Sharding
Sometimes your access pattern forces a low-cardinality key (e.g., a leaderboard where everyone reads the top scores). In this case, use write sharding: artificially increase cardinality by appending a calculated suffix to the partition key.
🔽 [Click to expand] Write Sharding — Python Example
import boto3
import random
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('LeaderboardTable')
NUM_SHARDS = 10 # Number of logical shards
def write_score(game_id: str, user_id: str, score: int):
"""
Shard writes across NUM_SHARDS partitions.
Partition key becomes 'game_id#shard_N' instead of just 'game_id'.
"""
shard_id = random.randint(0, NUM_SHARDS - 1)
sharded_pk = f"{game_id}#shard_{shard_id}"
table.put_item(
Item={
'pk': sharded_pk, # Sharded partition key
'userId': user_id,
'score': score,
}
)
print(f"Written to shard: {sharded_pk}")
def read_all_scores(game_id: str) -> list:
"""
To read, you must query ALL shards and merge results.
This is the trade-off of write sharding.
"""
all_items = []
for shard_id in range(NUM_SHARDS):
sharded_pk = f"{game_id}#shard_{shard_id}"
response = table.query(
KeyConditionExpression=boto3.dynamodb.conditions.Key('pk').eq(sharded_pk)
)
all_items.extend(response.get('Items', []))
return all_items
# Usage
write_score('game_chess', 'user_42', 9500)
all_scores = read_all_scores('game_chess')
print(f"Total records fetched: {len(all_scores)}")
Trade-off: Write sharding distributes writes but requires scatter-gather reads (query all N shards and merge). Use it only when write distribution is the bottleneck.
Monitoring Hot Partitions with CloudWatch
DynamoDB exposes metrics in Amazon CloudWatch that help you detect hot partitions early:
- ConsumedWriteCapacityUnits / ConsumedReadCapacityUnits: Monitor at the table level for unexpected spikes.
- ThrottledRequests: A non-zero value is a direct signal of capacity exhaustion on one or more partitions.
- SuccessfulRequestLatency: Latency spikes often precede visible throttling.
For deeper partition-level visibility, enable DynamoDB Contributor Insights. This feature identifies the most frequently accessed partition keys and sort keys, directly surfacing hot key patterns in your table.
On-Demand vs. Provisioned Capacity: Does It Solve Hot Partitions?
A common misconception is that switching to on-demand capacity mode eliminates hot partition problems. It does not. On-demand mode removes the need to pre-provision capacity and handles traffic spikes more gracefully, but DynamoDB still has per-partition throughput limits. A severely skewed partition key can still cause throttling even in on-demand mode, particularly during sudden traffic bursts before DynamoDB's internal scaling responds.
The correct fix is always a better partition key design — capacity mode is an operational concern, not a data modeling fix.
Quick Reference: Partition Key Design Checklist
| Check | Good Signal | Bad Signal |
|---|---|---|
| Cardinality | Millions of distinct values | Fewer than 1,000 distinct values |
| Access Distribution | Reads/writes spread across many keys | 80%+ traffic to same key value |
| Key Type | UUID, userId, deviceId, orderId | status, type, date, boolean |
| Range Queries | Use composite key (PK + SK) | Scanning entire table for ranges |
| Forced Low Cardinality | Apply write sharding with suffix | Using raw low-cardinality key |
Glossary
| Term | Definition |
|---|---|
| Partition Key | The primary key attribute hashed by DynamoDB to determine physical data placement. |
| Hot Partition | A single partition receiving a disproportionate share of read/write requests, leading to throttling. |
| Cardinality | The number of distinct values an attribute can hold. High cardinality = better distribution. |
| Write Sharding | A technique of appending a random or calculated suffix to a partition key to artificially increase cardinality. |
| Contributor Insights | A DynamoDB feature that uses CloudWatch to identify the most accessed keys in a table or index. |
Next Steps
If you suspect a hot partition today, enable DynamoDB Contributor Insights immediately — it will show you exactly which keys are being hammered. Then apply the partition key checklist above to your table design. For existing tables with the wrong partition key, you will need to migrate data to a new table; AWS does not support changing the partition key of an existing table in-place. Refer to the official AWS DynamoDB Best Practices for Partition Keys documentation for the authoritative guidance.
Comments
Post a Comment