Scaling Reads with RDS Read Replicas: Architecture, Load Distribution & Multi-AZ Comparison

Your application's read traffic has outpaced your primary RDS instance — SELECT queries are piling up, latency is climbing, and vertical scaling is hitting diminishing returns. This is a classic read-heavy workload problem, and RDS Read Replicas are the purpose-built solution. But engineers frequently conflate Read Replicas with Multi-AZ deployments — they look similar on the surface but solve fundamentally different problems.

TL;DR

DimensionRead ReplicaMulti-AZ
Primary PurposeScale read throughputHigh availability / failover
Replication TypeAsynchronousSynchronous
Replica is Queryable?Yes — has its own endpointNo — standby is passive
Failover Target?Manual promotion requiredAutomatic failover
Cross-Region SupportYesNo (same region, different AZ)
Use CaseReporting, analytics, read scalingProduction HA, DR within region

The Core Problem: Read-Heavy Workloads

A single RDS primary instance handles both reads and writes on the same compute and I/O resources. When your read-to-write ratio skews heavily toward reads (common in analytics dashboards, reporting engines, and content platforms), the primary instance becomes a bottleneck — not because writes are slow, but because reads are consuming CPU, memory, and IOPS that writes also need.

Analogy: Think of your primary DB like a head chef who both cooks orders (writes) and answers every customer question about the menu (reads). Adding a Read Replica is like hiring a dedicated front-of-house staff member who handles all menu questions — the chef stays focused on cooking. The front-of-house staff gets a copy of the menu (asynchronous replication) and may be a few seconds behind if the menu just changed, but for 95% of questions, they're perfectly accurate.

How RDS Read Replicas Work

When you create a Read Replica, RDS uses the database engine's native asynchronous replication mechanism to stream changes from the primary instance to the replica. The replica maintains its own storage volume and exposes a separate DNS endpoint your application connects to for read traffic.

graph LR App["Application Layer"] WConn["Write Connection (Primary Endpoint)"] RConn["Read Connection (Replica Endpoint)"] Primary[("RDS Primary Instance")] Replica[("RDS Read Replica")] RepStream["Async Replication Stream"] App -->|"INSERT / UPDATE / DELETE"| WConn App -->|"SELECT queries"| RConn WConn --> Primary RConn --> Replica Primary -->|"Engine-native async replication (binlog / WAL)"| RepStream RepStream --> Replica
  1. Application Layer: Your app is configured with two connection strings — one for writes (primary endpoint) and one for reads (replica endpoint).
  2. Primary Instance: Handles all INSERT, UPDATE, DELETE, and DDL operations. Changes are written to its storage and streamed asynchronously to replicas.
  3. Replication Stream: Engine-native async replication (e.g., MySQL binlog, PostgreSQL WAL streaming). There is an inherent replication lag — replicas are eventually consistent, not strongly consistent.
  4. Read Replica: Accepts SELECT queries only. Has its own endpoint, its own storage, and can be in the same AZ, a different AZ, or even a different AWS Region.
  5. Replica Scaling: You can create up to 5 Read Replicas for MySQL, MariaDB, and Oracle; up to 5 for PostgreSQL. You can also create replicas of replicas (cascading) for MySQL and MariaDB.

Replication Lag: The Critical Caveat

Because replication is asynchronous, replicas are eventually consistent. A write committed on the primary may not be immediately visible on the replica. This lag is typically sub-second under normal load but can grow under heavy write pressure or network conditions.

Practical implication: Never route reads to a replica when the query requires reading data that was just written in the same request lifecycle (e.g., "write a record, then immediately read it back to confirm"). Route those reads to the primary. Use replicas for workloads that tolerate eventual consistency: reporting queries, analytics, search indexes, and read-heavy dashboards.

You can monitor replication lag using the CloudWatch metric ReplicaLag on the Read Replica instance.

Read Replica vs. Multi-AZ: Architecture Deep Dive

graph TD subgraph MultiAZ ["Multi-AZ Deployment — High Availability"] direction LR AppHA["Application"] ClusterEP["Single Cluster Endpoint"] PrimaryHA[("Primary AZ-A")] Standby[("Standby AZ-B (Passive)")] AppHA --> ClusterEP ClusterEP --> PrimaryHA PrimaryHA -->|"Synchronous replication"| Standby Standby -.->|"Auto-failover (not queryable)"| ClusterEP end subgraph ReadReplica ["Read Replica — Read Scaling"] direction LR AppRR["Application"] WriteEP["Primary Endpoint"] ReadEP["Replica Endpoint"] PrimaryRR[("Primary Instance")] ReplicaRR[("Read Replica (Active, Queryable)")] AppRR -->|"Writes"| WriteEP AppRR -->|"Reads"| ReadEP WriteEP --> PrimaryRR ReadEP --> ReplicaRR PrimaryRR -->|"Async replication"| ReplicaRR end
  1. Multi-AZ Standby is a passive, synchronous mirror. RDS automatically fails over to it if the primary fails. Your application never directly connects to the standby — it always uses the single cluster endpoint, which RDS re-points during failover.
  2. Read Replica is an active, queryable instance. Your application explicitly connects to its endpoint for read traffic. It is not an automatic failover target — promotion to primary is a manual operation (or can be scripted).
  3. You can combine both: a Multi-AZ primary (for HA) with one or more Read Replicas (for read scaling). These are orthogonal features.

Decision Framework: Which Feature Do You Need?

graph TD Start(["DB Performance Problem"]) Q1{"Primary symptom?"} Q2{"Can reads tolerate eventual consistency?"} Q3{"Need cross-region read scaling?"} SolA["Enable Multi-AZ for automatic failover"] SolB["Create Read Replica(s) and split reads in app"] SolC["Create Cross-Region Read Replica"] SolD["Route reads to primary or use Aurora with zero-lag replicas"] Combine["Combine both: Multi-AZ Primary + Read Replica(s)"] Q4{"Also need HA for the primary?"} Start --> Q1 Q1 -->|"Instance failure / auto-failover needed"| SolA Q1 -->|"Too many SELECT queries / read latency"| Q2 Q2 -->|"No — strong consistency required"| SolD Q2 -->|"Yes"| Q3 Q3 -->|"Yes"| SolC Q3 -->|"No"| Q4 Q4 -->|"Yes"| Combine Q4 -->|"No"| SolB

Implementation: Creating a Read Replica

Via AWS CLI

# Create a Read Replica in the same region
aws rds create-db-instance-read-replica \
  --db-instance-identifier myapp-read-replica-1 \
  --source-db-instance-identifier myapp-primary \
  --db-instance-class db.r6g.large \
  --availability-zone us-east-1b \
  --publicly-accessible false \
  --region us-east-1

Application-Level Read/Write Splitting

Creating the replica is only half the work. Your application must be configured to route reads to the replica endpoint. Below is a Python (SQLAlchemy) pattern for explicit read/write splitting:

🔽 [Click to expand] Python SQLAlchemy Read/Write Split Pattern
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

# Connection strings from environment variables (never hardcode credentials)
PRIMARY_DB_URL = "postgresql://user:pass@myapp-primary.xxxx.us-east-1.rds.amazonaws.com:5432/mydb"
REPLICA_DB_URL = "postgresql://user:pass@myapp-read-replica-1.xxxx.us-east-1.rds.amazonaws.com:5432/mydb"

# Separate engines for primary (writes) and replica (reads)
primary_engine = create_engine(
    PRIMARY_DB_URL,
    pool_size=10,
    max_overflow=20,
    pool_pre_ping=True  # Validates connections before use
)

replica_engine = create_engine(
    REPLICA_DB_URL,
    pool_size=20,       # Larger pool — replicas handle more concurrent reads
    max_overflow=40,
    pool_pre_ping=True
)

WriteSession = sessionmaker(bind=primary_engine)
ReadSession = sessionmaker(bind=replica_engine)

# Usage pattern
def get_user_profile(user_id: int):
    """Read operation — route to replica."""
    with ReadSession() as session:
        return session.query(User).filter(User.id == user_id).first()

def create_order(order_data: dict):
    """Write operation — route to primary."""
    with WriteSession() as session:
        order = Order(**order_data)
        session.add(order)
        session.commit()
        return order

IAM Policy for Read Replica Management

The following IAM policy grants least-privilege permissions to create and describe Read Replicas. Note that rds:DescribeDBInstances requires a wildcard resource (*) because it is a list/describe action that does not support resource-level permissions — it is separated into its own statement to make this explicit.

🔽 [Click to expand] IAM Policy JSON
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowReadReplicaCreation",
      "Effect": "Allow",
      "Action": [
        "rds:CreateDBInstanceReadReplica",
        "rds:DeleteDBInstance",
        "rds:ModifyDBInstance",
        "rds:PromoteReadReplica"
      ],
      "Resource": [
        "arn:aws:rds:us-east-1:123456789012:db:myapp-primary",
        "arn:aws:rds:us-east-1:123456789012:db:myapp-read-replica-1"
      ]
    },
    {
      "Sid": "AllowDescribeInstances",
      "Effect": "Allow",
      "Action": [
        "rds:DescribeDBInstances"
      ],
      "Resource": "*"
    }
  ]
}

Monitoring Read Replicas

Key CloudWatch metrics to watch on your Read Replica instance:

  • ReplicaLag — Seconds behind the primary. Alert if this exceeds your application's consistency tolerance.
  • CPUUtilization — High CPU on the replica indicates it may be under-provisioned for the read load.
  • DatabaseConnections — Track connection pool saturation on the replica endpoint.
  • ReadIOPS — Confirms read traffic is actually hitting the replica as expected.

Key Constraints & Gotchas

  • Automated backups must be enabled on the source instance to create a Read Replica.
  • Replicas are billed separately — each replica is a full RDS instance with its own compute and storage costs.
  • Cross-region replicas incur data transfer costs for the replication traffic.
  • Promotion is irreversible — once you promote a replica to a standalone primary, it cannot be re-attached as a replica.
  • DDL operations on the primary (e.g., ALTER TABLE) replicate to replicas and can cause temporary lag spikes.

Glossary

TermDefinition
Asynchronous ReplicationChanges are sent to the replica after the primary commits them. The primary does not wait for replica acknowledgment — introduces eventual consistency.
Synchronous ReplicationThe primary waits for the standby to confirm the write before acknowledging the commit. Guarantees zero data loss but adds write latency.
Replication LagThe time delta between a write being committed on the primary and becoming visible on the replica.
Read/Write SplittingApplication-level routing pattern that directs write queries to the primary endpoint and read queries to replica endpoints.
Replica PromotionThe manual process of converting a Read Replica into a standalone, writable RDS primary instance.

Next Steps

  • 📖 AWS Docs: Working with Read Replicas
  • 📖 AWS Docs: Multi-AZ Deployments
  • If your read scaling needs exceed what multiple replicas can handle, evaluate Amazon Aurora — its cluster architecture shares a single storage volume across up to 15 low-latency read replicas, eliminating replication lag as a concern for most workloads.

Comments

Popular posts from this blog

EC2 No Internet Access in Custom VPC: Attaching an Internet Gateway and Fixing Route Tables

EC2 SSH Connection Timeout: The Exact Security Group Rules You Need to Fix It

IAM User vs. IAM Role: Why Your EC2 Instance Should Never Use a User