When to Use ElastiCache Redis: Fixing Slow RDS Read Performance
Your RDS instance is under siege — the same product catalog, user session, or leaderboard query fires hundreds of times per second, each one hitting the database cold. Adding an ElastiCache Redis layer intercepts those repetitive reads before they ever reach RDS, slashing response times from tens of milliseconds to sub-millisecond.
TL;DR
| Scenario | Without Redis Cache | With ElastiCache Redis |
|---|---|---|
| Repeated read query | Hits RDS every time | Served from in-memory cache |
| Response latency | 10–100 ms (DB round-trip) | Sub-millisecond cache hit |
| RDS CPU/load | Scales with read traffic | Dramatically reduced |
| Scalability | Vertical scaling or read replicas | Horizontal cache scaling |
| Cost | Larger RDS instance needed | Smaller RDS + cache node |
Why RDS Slows Down Under Repeated Reads
Relational databases are optimized for durability and consistency, not raw read throughput at scale. Every query — even a cache-friendly SELECT — incurs connection overhead, query parsing, buffer pool lookups, and network round-trips. When the same data is requested thousands of times per minute, you are paying that full cost repeatedly for zero new information.
The root cause is almost always one of three patterns:
- Hot rows: A small subset of rows (e.g., top-10 products) accounts for the majority of reads.
- Expensive aggregations:
COUNT,SUM, orJOIN-heavy queries that are deterministic over short windows. - Session/token lookups: Auth tokens or user sessions validated on every API request.
The Cache-Aside Pattern: How Redis Intercepts Reads
The most common and operationally safe caching strategy is Cache-Aside (also called Lazy Loading). The application owns the cache logic — Redis is never written to directly by the database.
- Client Request: The client sends a read request to the application.
- Cache Check: The application queries Redis first using a deterministic cache key (e.g.,
product:42). - Cache Hit: If the key exists, Redis returns the value immediately. The application responds to the client — RDS is never touched.
- Cache Miss: If the key is absent (first request or TTL expired), the application falls through to RDS.
- Populate Cache: The application writes the RDS result back into Redis with a TTL, then responds to the client.
- Subsequent Requests: All future requests for the same key hit Redis until the TTL expires.
Analogy: Think of Redis as a whiteboard next to your desk. The first time a colleague asks you a complex question, you research the answer from the filing cabinet (RDS) and write it on the whiteboard. Every subsequent colleague asking the same question gets the answer from the whiteboard instantly — the filing cabinet stays closed.
Cache Invalidation: Keeping Data Consistent
Stale cache data is the primary operational risk. The application must invalidate or update the cache whenever the underlying data changes. The application acts as the orchestrator for both the write to RDS and the subsequent cache invalidation.
fetching fresh data from RDS and repopulating Redis"
- Write Request: A mutation (INSERT/UPDATE/DELETE) arrives at the application.
- Persist to RDS: The application writes the authoritative data to RDS first. This ensures durability before any cache operation.
- Invalidate Cache: After a successful RDS write, the application issues a
DELcommand to Redis, removing the now-stale key. - Next Read Repopulates: The next read request for that key will be a cache miss, fetch fresh data from RDS, and re-populate Redis.
Why delete instead of update? Deleting is safer than writing the new value directly to Redis after a write. It avoids race conditions where two concurrent writes could populate the cache with an older value.
AWS Infrastructure Architecture
In a production AWS environment, ElastiCache Redis and RDS reside in private subnets. Security Groups enforce least-privilege network access — only the application tier can reach the cache, and only the application tier can reach the database.
Load Balancer"] end subgraph "Private Subnet - App Tier" APP["Application
(EC2 / ECS / Lambda)
AppSG"] end subgraph "Private Subnet - Data Tier" REDIS["ElastiCache Redis
Replication Group
CacheSG :6379"] RDS["Amazon RDS
(Primary + Standby)
RDSSG :5432"] end end Internet(["Internet"]) --> ALB ALB --> APP APP -->|"Inbound: AppSG only"| REDIS APP -->|"Inbound: AppSG only"| RDS
- VPC Isolation: All components live inside a VPC. ElastiCache and RDS are in private subnets with no public internet access.
- AppSG (Application Security Group): Attached to your EC2/ECS/Lambda compute. It is the only source allowed to connect to both the cache and the database.
- CacheSG: Allows inbound TCP on port
6379only fromAppSG. No other traffic is permitted. - RDSSG: Allows inbound TCP on port
5432(PostgreSQL) or3306(MySQL) only fromAppSG. - No direct Cache-to-RDS path: The cache and database have no network path between them. The application is the sole orchestrator.
Implementation: Python (boto3 + redis-py)
The following snippet demonstrates the Cache-Aside pattern with TTL-based expiry. Replace the placeholder connection strings with your actual ElastiCache and RDS endpoints.
🔽 [Click to expand] cache_aside.py — Cache-Aside Pattern Implementation
import redis
import psycopg2
import json
import os
# --- Connection Setup ---
# Retrieve endpoints from environment variables (never hardcode)
REDIS_HOST = os.environ["ELASTICACHE_ENDPOINT"] # e.g., my-cluster.abc123.ng.0001.use1.cache.amazonaws.com
REDIS_PORT = 6379
CACHE_TTL_SECONDS = 300 # 5-minute TTL
RDS_HOST = os.environ["RDS_ENDPOINT"]
RDS_DB = os.environ["RDS_DB_NAME"]
RDS_USER = os.environ["RDS_USER"]
RDS_PASS = os.environ["RDS_PASSWORD"]
# Initialize Redis client (use SSL for production ElastiCache)
cache = redis.Redis(
host=REDIS_HOST,
port=REDIS_PORT,
ssl=True, # Required for ElastiCache in-transit encryption
decode_responses=True
)
def get_db_connection():
return psycopg2.connect(
host=RDS_HOST, dbname=RDS_DB,
user=RDS_USER, password=RDS_PASS
)
# --- Cache-Aside Read ---
def get_product(product_id: int) -> dict:
cache_key = f"product:{product_id}"
# 1. Check cache first
cached_value = cache.get(cache_key)
if cached_value:
print(f"[CACHE HIT] {cache_key}")
return json.loads(cached_value)
# 2. Cache miss — query RDS
print(f"[CACHE MISS] {cache_key} — querying RDS")
conn = get_db_connection()
try:
with conn.cursor() as cur:
cur.execute("SELECT id, name, price FROM products WHERE id = %s", (product_id,))
row = cur.fetchone()
if not row:
return None
product = {"id": row[0], "name": row[1], "price": float(row[2])}
finally:
conn.close()
# 3. Populate cache with TTL
cache.setex(cache_key, CACHE_TTL_SECONDS, json.dumps(product))
return product
# --- Write-Through Invalidation ---
def update_product(product_id: int, name: str, price: float) -> None:
cache_key = f"product:{product_id}"
# 1. Write to RDS first (source of truth)
conn = get_db_connection()
try:
with conn.cursor() as cur:
cur.execute(
"UPDATE products SET name = %s, price = %s WHERE id = %s",
(name, price, product_id)
)
conn.commit()
finally:
conn.close()
# 2. Invalidate cache AFTER successful DB write
deleted = cache.delete(cache_key)
print(f"[CACHE INVALIDATED] {cache_key} — deleted={deleted}")
IAM & Security: Least Privilege for ElastiCache
ElastiCache Redis access is controlled at the network layer (Security Groups) and, for ElastiCache with AUTH or RBAC enabled, at the application layer. For the AWS control plane (creating/describing clusters), your application's IAM role should follow least privilege.
🔽 [Click to expand] IAM Policy — Least Privilege for ElastiCache Describe Operations
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ElastiCacheReadOnly",
"Effect": "Allow",
"Action": [
"elasticache:DescribeCacheClusters",
"elasticache:DescribeReplicationGroups"
],
"Resource": "arn:aws:elasticache:us-east-1:123456789012:replicationgroup:my-redis-cluster"
}
]
}
Key security practices for ElastiCache Redis in production:
- Enable in-transit encryption (TLS) and at-rest encryption when creating the cluster.
- Enable Redis AUTH or use ElastiCache RBAC (Role-Based Access Control) for user-level authentication.
- Place ElastiCache in a private subnet — never expose port 6379 to the internet.
- Restrict Security Group ingress to only the application's Security Group ID, not a CIDR block.
Choosing the Right TTL Strategy
| Data Type | Recommended TTL | Invalidation Trigger |
|---|---|---|
| Product catalog | 5–15 minutes | On product update event |
| User session / auth token | Match session expiry | On logout or token revocation |
| Leaderboard / aggregation | 30–60 seconds | TTL expiry (acceptable staleness) |
| Configuration / feature flags | 1–5 minutes | On config change deployment |
| Real-time inventory | Very short or no cache | Not suitable for caching |
When ElastiCache Redis Is NOT the Answer
- Write-heavy workloads: If your traffic is predominantly writes, a cache adds invalidation overhead with minimal read benefit.
- Highly dynamic data: Data that changes on every request (e.g., real-time stock prices) will have a near-zero cache hit rate.
- Missing indexes: If your RDS slowness is caused by missing indexes or unoptimized queries, fix the query first. A cache on top of a broken query is a band-aid.
- Strong consistency requirements: If your application cannot tolerate even seconds of stale data, cache-aside with TTL may not be appropriate without a robust invalidation strategy.
Wrap-Up & Next Steps
ElastiCache Redis is the right tool when your RDS pain is caused by repetitive reads of relatively stable data. The Cache-Aside pattern keeps your application in control, your database load low, and your response times in the sub-millisecond range. Start with a single-node cluster for development, then graduate to a Multi-AZ replication group for production resilience.
- 📖 AWS ElastiCache for Redis — Official Documentation
- 📖 ElastiCache Best Practices
- 📖 ElastiCache Encryption at Rest and In-Transit
Glossary
| Term | Definition |
|---|---|
| Cache-Aside (Lazy Loading) | A caching pattern where the application checks the cache before querying the database, and populates the cache on a miss. |
| TTL (Time-To-Live) | An expiry duration set on a cache key after which Redis automatically evicts it, forcing a fresh database read. |
| Cache Hit / Cache Miss | A hit means the requested key was found in Redis. A miss means it was absent and the database must be queried. |
| Cache Invalidation | The process of removing or updating a stale cache entry after the underlying data changes in the database. |
| Replication Group | An ElastiCache construct that manages a primary Redis node and one or more read replicas across Availability Zones for high availability. |
Comments
Post a Comment