DynamoDB Partition Keys: Why Your Choice Determines Performance (and How to Fix Hot Partitions)

You designed a DynamoDB table, deployed it to production, and now certain queries are throttling while others fly — the culprit is almost always a poorly chosen partition key creating a 'hot partition'. Understanding how DynamoDB physically distributes data is the single most important design decision you'll make with this service.

TL;DR

ConceptWhat It MeansImpact
Partition KeyThe primary attribute DynamoDB uses to route data to a physical storage nodeDetermines data distribution
Hot PartitionOne partition receiving disproportionately high read/write trafficThrottling, latency spikes
High Cardinality KeyA key with many distinct values (e.g., userId, orderId)Even distribution, best performance
Low Cardinality KeyA key with few distinct values (e.g., status, country)Uneven distribution, hot partitions
Write ShardingAppending a random/calculated suffix to spread writes across partitionsEliminates hot partitions artificially

How DynamoDB Partitioning Works Internally

DynamoDB is a fully managed key-value and document store that automatically partitions data across multiple storage nodes. When you write an item, DynamoDB applies an internal hash function to the partition key value to determine which physical partition stores that item. Every read and write operation is routed to exactly one partition based on this hash.

Each partition has a defined throughput ceiling. When provisioned capacity mode is used, the total table throughput is divided across partitions. If one partition receives the majority of traffic, it exhausts its share of capacity while other partitions sit idle — this is a hot partition.

graph LR Client([Client Request]) --> HashFn[Hash Function] HashFn --> |token range 0-25%| P1[Partition 1] HashFn --> |token range 25-50%| P2[Partition 2] HashFn --> |token range 50-75%| P3[Partition 3] HashFn --> |token range 75-100%| P4[Partition 4] P1 --> S1[(Storage Node 1)] P2 --> S2[(Storage Node 2)] P3 --> S3[(Storage Node 3)] P4 --> S4[(Storage Node 4)]
  1. Hash Function: DynamoDB hashes the partition key value to produce a numeric token.
  2. Token Space: The full hash space is divided into ranges, each assigned to a physical partition.
  3. Routing: Every read/write is routed to the single partition whose range contains the item's hash token.
  4. Hot Partition Problem: If most items share the same partition key (e.g., status = "active"), they all hash to the same partition, overwhelming it.

The Hot Partition Problem: A Concrete Example

Imagine a table storing IoT sensor events with deviceType as the partition key. If 90% of your devices are type "temperature_sensor", then 90% of all writes hammer a single partition. The remaining partitions are nearly idle.

graph TD subgraph BAD [Bad Key - deviceType low cardinality] W1[Write: temperature_sensor] --> P_HOT[Partition A - OVERLOADED] W2[Write: temperature_sensor] --> P_HOT W3[Write: temperature_sensor] --> P_HOT W4[Write: humidity_sensor] --> P_IDLE1[Partition B - idle] W5[Write: pressure_sensor] --> P_IDLE2[Partition C - idle] end subgraph GOOD [Good Key - deviceId high cardinality] W6[Write: device_001] --> PA[Partition A] W7[Write: device_002] --> PB[Partition B] W8[Write: device_003] --> PC[Partition C] W9[Write: device_004] --> PD[Partition D] W10[Write: device_005] --> PE[Partition E] end
  1. Bad Key (deviceType): Low cardinality — most writes go to one partition, causing throttling.
  2. Good Key (deviceId): High cardinality — writes spread evenly across all partitions.
  3. The physical storage capacity and throughput per partition is fixed; overloading one causes ProvisionedThroughputExceededException or consumed capacity spikes in on-demand mode.
Analogy: Think of DynamoDB partitions as checkout lanes at a supermarket. If every customer is told to use lane 3 (because their item type maps there), lane 3 has a massive queue while lanes 1, 2, 4, and 5 are empty. A good partition key is like a load balancer that sends customers to the shortest available lane — distributing work evenly across all resources.

Choosing a Good Partition Key: The Rules

The golden rule is high cardinality + uniform access patterns. Here is a decision framework:

graph TD Start([Choose Partition Key]) --> Q1{High Cardinality?} Q1 -- No --> Fix1[Add UUID suffix or use sharding] Q1 -- Yes --> Q2{Uniform Access Pattern?} Q2 -- No --> Fix2[Re-evaluate access patterns or use caching] Q2 -- Yes --> Q3{Need Range Queries?} Q3 -- Yes --> Sol1[Composite Key: PK + Sort Key] Q3 -- No --> Sol2[Simple Partition Key - Good to Go] Fix1 --> Sol1 Fix2 --> Sol2

Rule 1: Maximize Cardinality

Choose an attribute with as many distinct values as possible. Natural candidates:

  • userId — unique per user
  • orderId — unique per transaction
  • deviceId — unique per device
  • sessionId — unique per session

Avoid: status, country, boolean flags, date strings (e.g., "2024-01-15" — all items on that day share one partition).

Rule 2: Ensure Uniform Access

High cardinality alone is not enough. If your application always reads the same 10 user IDs out of 10 million, those 10 partitions are hot. Your access pattern must be as distributed as your key space.

Rule 3: Composite Keys for Range Queries

Use a composite primary key (partition key + sort key) when you need range queries within a logical group. For example: partition key = userId, sort key = timestamp. This lets you query all events for a user in a time range while keeping writes distributed across users.

Advanced Fix: Write Sharding

Sometimes your access pattern forces a low-cardinality key (e.g., a leaderboard where everyone reads the top scores). In this case, use write sharding: artificially increase cardinality by appending a calculated suffix to the partition key.

🔽 [Click to expand] Write Sharding — Python Example
import boto3
import random

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('LeaderboardTable')

NUM_SHARDS = 10  # Number of logical shards

def write_score(game_id: str, user_id: str, score: int):
    """
    Shard writes across NUM_SHARDS partitions.
    Partition key becomes 'game_id#shard_N' instead of just 'game_id'.
    """
    shard_id = random.randint(0, NUM_SHARDS - 1)
    sharded_pk = f"{game_id}#shard_{shard_id}"

    table.put_item(
        Item={
            'pk': sharded_pk,       # Sharded partition key
            'userId': user_id,
            'score': score,
        }
    )
    print(f"Written to shard: {sharded_pk}")

def read_all_scores(game_id: str) -> list:
    """
    To read, you must query ALL shards and merge results.
    This is the trade-off of write sharding.
    """
    all_items = []
    for shard_id in range(NUM_SHARDS):
        sharded_pk = f"{game_id}#shard_{shard_id}"
        response = table.query(
            KeyConditionExpression=boto3.dynamodb.conditions.Key('pk').eq(sharded_pk)
        )
        all_items.extend(response.get('Items', []))
    return all_items

# Usage
write_score('game_chess', 'user_42', 9500)
all_scores = read_all_scores('game_chess')
print(f"Total records fetched: {len(all_scores)}")

Trade-off: Write sharding distributes writes but requires scatter-gather reads (query all N shards and merge). Use it only when write distribution is the bottleneck.

Monitoring Hot Partitions with CloudWatch

DynamoDB exposes metrics in Amazon CloudWatch that help you detect hot partitions early:

  • ConsumedWriteCapacityUnits / ConsumedReadCapacityUnits: Monitor at the table level for unexpected spikes.
  • ThrottledRequests: A non-zero value is a direct signal of capacity exhaustion on one or more partitions.
  • SuccessfulRequestLatency: Latency spikes often precede visible throttling.

For deeper partition-level visibility, enable DynamoDB Contributor Insights. This feature identifies the most frequently accessed partition keys and sort keys, directly surfacing hot key patterns in your table.

graph LR Table[DynamoDB Table] --> CW[CloudWatch Metrics] CW --> M1[ThrottledRequests] CW --> M2[ConsumedWriteCapacityUnits] CW --> M3[SuccessfulRequestLatency] Table --> CI[Contributor Insights] CI --> HK[Hot Key Report] HK --> Action[Redesign Partition Key] M1 --> Alert[CloudWatch Alarm] Alert --> Action

On-Demand vs. Provisioned Capacity: Does It Solve Hot Partitions?

A common misconception is that switching to on-demand capacity mode eliminates hot partition problems. It does not. On-demand mode removes the need to pre-provision capacity and handles traffic spikes more gracefully, but DynamoDB still has per-partition throughput limits. A severely skewed partition key can still cause throttling even in on-demand mode, particularly during sudden traffic bursts before DynamoDB's internal scaling responds.

The correct fix is always a better partition key design — capacity mode is an operational concern, not a data modeling fix.

Quick Reference: Partition Key Design Checklist

CheckGood SignalBad Signal
CardinalityMillions of distinct valuesFewer than 1,000 distinct values
Access DistributionReads/writes spread across many keys80%+ traffic to same key value
Key TypeUUID, userId, deviceId, orderIdstatus, type, date, boolean
Range QueriesUse composite key (PK + SK)Scanning entire table for ranges
Forced Low CardinalityApply write sharding with suffixUsing raw low-cardinality key

Glossary

TermDefinition
Partition KeyThe primary key attribute hashed by DynamoDB to determine physical data placement.
Hot PartitionA single partition receiving a disproportionate share of read/write requests, leading to throttling.
CardinalityThe number of distinct values an attribute can hold. High cardinality = better distribution.
Write ShardingA technique of appending a random or calculated suffix to a partition key to artificially increase cardinality.
Contributor InsightsA DynamoDB feature that uses CloudWatch to identify the most accessed keys in a table or index.

Next Steps

If you suspect a hot partition today, enable DynamoDB Contributor Insights immediately — it will show you exactly which keys are being hammered. Then apply the partition key checklist above to your table design. For existing tables with the wrong partition key, you will need to migrate data to a new table; AWS does not support changing the partition key of an existing table in-place. Refer to the official AWS DynamoDB Best Practices for Partition Keys documentation for the authoritative guidance.

Related Posts

Comments

Popular posts from this blog

EC2 No Internet Access in Custom VPC: Attaching an Internet Gateway and Fixing Route Tables

IAM User vs. IAM Role: Why Your EC2 Instance Should Never Use a User

EC2 SSH Connection Timeout: The Exact Security Group Rules You Need to Fix It