DynamoDB Partition Keys: Why Your Choice Determines Performance (and How to Fix Hot Partitions)

You designed a DynamoDB table, deployed it to production, and now certain queries are throttling while others fly — the culprit is almost always a poorly chosen partition key creating a 'hot partition'. Understanding how DynamoDB physically distributes data is the single most important design decision you'll make with this service.

TL;DR

Concept	What It Means	Impact
Partition Key	The primary attribute DynamoDB uses to route data to a physical storage node	Determines data distribution
Hot Partition	One partition receiving disproportionately high read/write traffic	Throttling, latency spikes
High Cardinality Key	A key with many distinct values (e.g., userId, orderId)	Even distribution, best performance
Low Cardinality Key	A key with few distinct values (e.g., status, country)	Uneven distribution, hot partitions
Write Sharding	Appending a random/calculated suffix to spread writes across partitions	Eliminates hot partitions artificially

How DynamoDB Partitioning Works Internally

DynamoDB is a fully managed key-value and document store that automatically partitions data across multiple storage nodes. When you write an item, DynamoDB applies an internal hash function to the partition key value to determine which physical partition stores that item. Every read and write operation is routed to exactly one partition based on this hash.

Each partition has a defined throughput ceiling. When provisioned capacity mode is used, the total table throughput is divided across partitions. If one partition receives the majority of traffic, it exhausts its share of capacity while other partitions sit idle — this is a hot partition.

Hash Function: DynamoDB hashes the partition key value to produce a numeric token.
Token Space: The full hash space is divided into ranges, each assigned to a physical partition.
Routing: Every read/write is routed to the single partition whose range contains the item's hash token.
Hot Partition Problem: If most items share the same partition key (e.g., status = "active"), they all hash to the same partition, overwhelming it.

The Hot Partition Problem: A Concrete Example

Imagine a table storing IoT sensor events with deviceType as the partition key. If 90% of your devices are type "temperature_sensor", then 90% of all writes hammer a single partition. The remaining partitions are nearly idle.

graph TD subgraph BAD [Bad Key - deviceType low cardinality] W1[Write: temperature_sensor] --> P_HOT[Partition A - OVERLOADED] W2[Write: temperature_sensor] --> P_HOT W3[Write: temperature_sensor] --> P_HOT W4[Write: humidity_sensor] --> P_IDLE1[Partition B - idle] W5[Write: pressure_sensor] --> P_IDLE2[Partition C - idle] end subgraph GOOD [Good Key - deviceId high cardinality] W6[Write: device_001] --> PA[Partition A] W7[Write: device_002] --> PB[Partition B] W8[Write: device_003] --> PC[Partition C] W9[Write: device_004] --> PD[Partition D] W10[Write: device_005] --> PE[Partition E] end

Bad Key (deviceType): Low cardinality — most writes go to one partition, causing throttling.
Good Key (deviceId): High cardinality — writes spread evenly across all partitions.
The physical storage capacity and throughput per partition is fixed; overloading one causes ProvisionedThroughputExceededException or consumed capacity spikes in on-demand mode.

Analogy: Think of DynamoDB partitions as checkout lanes at a supermarket. If every customer is told to use lane 3 (because their item type maps there), lane 3 has a massive queue while lanes 1, 2, 4, and 5 are empty. A good partition key is like a load balancer that sends customers to the shortest available lane — distributing work evenly across all resources.

Choosing a Good Partition Key: The Rules

The golden rule is high cardinality + uniform access patterns. Here is a decision framework:

graph TD Start([Choose Partition Key]) --> Q1{High Cardinality?} Q1 -- No --> Fix1[Add UUID suffix or use sharding] Q1 -- Yes --> Q2{Uniform Access Pattern?} Q2 -- No --> Fix2[Re-evaluate access patterns or use caching] Q2 -- Yes --> Q3{Need Range Queries?} Q3 -- Yes --> Sol1[Composite Key: PK + Sort Key] Q3 -- No --> Sol2[Simple Partition Key - Good to Go] Fix1 --> Sol1 Fix2 --> Sol2

Rule 1: Maximize Cardinality

Choose an attribute with as many distinct values as possible. Natural candidates:

userId — unique per user
orderId — unique per transaction
deviceId — unique per device
sessionId — unique per session

Avoid: status, country, boolean flags, date strings (e.g., "2024-01-15" — all items on that day share one partition).

Rule 2: Ensure Uniform Access

High cardinality alone is not enough. If your application always reads the same 10 user IDs out of 10 million, those 10 partitions are hot. Your access pattern must be as distributed as your key space.

Rule 3: Composite Keys for Range Queries

Use a composite primary key (partition key + sort key) when you need range queries within a logical group. For example: partition key = userId, sort key = timestamp. This lets you query all events for a user in a time range while keeping writes distributed across users.

Advanced Fix: Write Sharding

Sometimes your access pattern forces a low-cardinality key (e.g., a leaderboard where everyone reads the top scores). In this case, use write sharding: artificially increase cardinality by appending a calculated suffix to the partition key.

🔽 [Click to expand] Write Sharding — Python Example

import boto3
import random

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('LeaderboardTable')

NUM_SHARDS = 10  # Number of logical shards

def write_score(game_id: str, user_id: str, score: int):
    """
    Shard writes across NUM_SHARDS partitions.
    Partition key becomes 'game_id#shard_N' instead of just 'game_id'.
    """
    shard_id = random.randint(0, NUM_SHARDS - 1)
    sharded_pk = f"{game_id}#shard_{shard_id}"

    table.put_item(
        Item={
            'pk': sharded_pk,       # Sharded partition key
            'userId': user_id,
            'score': score,
        }
    )
    print(f"Written to shard: {sharded_pk}")

def read_all_scores(game_id: str) -> list:
    """
    To read, you must query ALL shards and merge results.
    This is the trade-off of write sharding.
    """
    all_items = []
    for shard_id in range(NUM_SHARDS):
        sharded_pk = f"{game_id}#shard_{shard_id}"
        response = table.query(
            KeyConditionExpression=boto3.dynamodb.conditions.Key('pk').eq(sharded_pk)
        )
        all_items.extend(response.get('Items', []))
    return all_items

# Usage
write_score('game_chess', 'user_42', 9500)
all_scores = read_all_scores('game_chess')
print(f"Total records fetched: {len(all_scores)}")

Trade-off: Write sharding distributes writes but requires scatter-gather reads (query all N shards and merge). Use it only when write distribution is the bottleneck.

Monitoring Hot Partitions with CloudWatch

DynamoDB exposes metrics in Amazon CloudWatch that help you detect hot partitions early:

ConsumedWriteCapacityUnits / ConsumedReadCapacityUnits: Monitor at the table level for unexpected spikes.
ThrottledRequests: A non-zero value is a direct signal of capacity exhaustion on one or more partitions.
SuccessfulRequestLatency: Latency spikes often precede visible throttling.

For deeper partition-level visibility, enable DynamoDB Contributor Insights. This feature identifies the most frequently accessed partition keys and sort keys, directly surfacing hot key patterns in your table.

graph LR Table[DynamoDB Table] --> CW[CloudWatch Metrics] CW --> M1[ThrottledRequests] CW --> M2[ConsumedWriteCapacityUnits] CW --> M3[SuccessfulRequestLatency] Table --> CI[Contributor Insights] CI --> HK[Hot Key Report] HK --> Action[Redesign Partition Key] M1 --> Alert[CloudWatch Alarm] Alert --> Action

On-Demand vs. Provisioned Capacity: Does It Solve Hot Partitions?

A common misconception is that switching to on-demand capacity mode eliminates hot partition problems. It does not. On-demand mode removes the need to pre-provision capacity and handles traffic spikes more gracefully, but DynamoDB still has per-partition throughput limits. A severely skewed partition key can still cause throttling even in on-demand mode, particularly during sudden traffic bursts before DynamoDB's internal scaling responds.

The correct fix is always a better partition key design — capacity mode is an operational concern, not a data modeling fix.

Quick Reference: Partition Key Design Checklist

Check	Good Signal	Bad Signal
Cardinality	Millions of distinct values	Fewer than 1,000 distinct values
Access Distribution	Reads/writes spread across many keys	80%+ traffic to same key value
Key Type	UUID, userId, deviceId, orderId	status, type, date, boolean
Range Queries	Use composite key (PK + SK)	Scanning entire table for ranges
Forced Low Cardinality	Apply write sharding with suffix	Using raw low-cardinality key

Glossary

Term	Definition
Partition Key	The primary key attribute hashed by DynamoDB to determine physical data placement.
Hot Partition	A single partition receiving a disproportionate share of read/write requests, leading to throttling.
Cardinality	The number of distinct values an attribute can hold. High cardinality = better distribution.
Write Sharding	A technique of appending a random or calculated suffix to a partition key to artificially increase cardinality.
Contributor Insights	A DynamoDB feature that uses CloudWatch to identify the most accessed keys in a table or index.

Next Steps

If you suspect a hot partition today, enable DynamoDB Contributor Insights immediately — it will show you exactly which keys are being hammered. Then apply the partition key checklist above to your table design. For existing tables with the wrong partition key, you will need to migrate data to a new table; AWS does not support changing the partition key of an existing table in-place. Refer to the official AWS DynamoDB Best Practices for Partition Keys documentation for the authoritative guidance.

Search This Blog

SW BBANG