DNS Failover with Route 53: Automatically Reroute Traffic to S3 When Your EC2 Goes Down

Your primary EC2 instance just crashed at 2 AM — without an automated failover strategy, your users hit a dead end until you wake up and intervene. This guide walks you through configuring Route 53 Active-Passive DNS failover so that traffic automatically reroutes to a static backup site on S3 the moment your primary server becomes unhealthy.

TL;DR

ComponentRoleType
Route 53 Health CheckMonitors EC2 endpoint healthHTTP/HTTPS/TCP probe
Primary DNS RecordPoints to EC2 (Elastic IP)Failover: PRIMARY
Secondary DNS RecordPoints to S3 static websiteFailover: SECONDARY
S3 Static WebsiteServes backup/maintenance pagePassive standby
SNS (optional)Alerts on health check failureNotification layer

Architecture Overview

Before diving into configuration, understand the data flow. Route 53 continuously probes your EC2 endpoint. When the health check fails, Route 53 stops resolving DNS to the PRIMARY record and automatically serves the SECONDARY record pointing to S3.

graph LR User(["👤 End User"]) R53["Route 53 DNS Service"] HC["Route 53 Health Checker"] EC2["🖥️ EC2 Instance (Primary) Elastic IP: 203.0.113.10"] S3["🪣 S3 Static Website (Secondary / Failover)"] CW["CloudWatch Alarm"] SNS["SNS Topic (Email Alert)"] User -->|"DNS Query: www.example.com"| R53 HC -->|"HTTP probe /health every 30s"| EC2 HC -->|"Health status"| R53 R53 -->|"✅ EC2 Healthy: Resolve to Elastic IP"| EC2 R53 -.->|"❌ EC2 Unhealthy: Failover to S3"| S3 HC -->|"HealthCheckStatus metric"| CW CW -->|"Alarm: status < 1"| SNS SNS -->|"Email notification"| Oncall(["📧 On-Call Team"]) style EC2 fill:#d4edda,stroke:#28a745 style S3 fill:#fff3cd,stroke:#ffc107 style R53 fill:#cce5ff,stroke:#004085 style HC fill:#e2d9f3,stroke:#6f42c1 style CW fill:#f8d7da,stroke:#dc3545 style SNS fill:#f8d7da,stroke:#dc3545
  1. User DNS Query: The client queries Route 53 for your domain (e.g., www.example.com).
  2. Health Check Probe: Route 53 health checkers (from multiple global locations) continuously poll your EC2 endpoint on the configured protocol/port/path.
  3. Healthy State: Route 53 resolves the domain to the EC2 Elastic IP — normal traffic flow.
  4. Unhealthy State: After the threshold of consecutive failures is met, Route 53 marks the PRIMARY record unhealthy and stops returning it in DNS responses.
  5. Failover Activated: Route 53 automatically returns the SECONDARY record — the S3 static website endpoint.
  6. SNS Alert: A CloudWatch alarm tied to the health check metric triggers an SNS notification to your on-call team.
Analogy: Think of Route 53 failover like a hospital's triage system. The primary ER (EC2) handles all patients normally. If the ER goes offline, an automated redirect sends patients to the urgent care clinic (S3). The clinic can't perform surgery, but it keeps patients informed and safe until the ER is back online.

Prerequisites

  • A registered domain managed in Route 53 (or a hosted zone with NS records delegated to Route 53).
  • A running EC2 instance with an Elastic IP assigned and a publicly accessible health check endpoint (e.g., /health).
  • An S3 bucket configured for static website hosting with a maintenance/fallback index.html.

Step 1: Configure S3 as a Static Website

The S3 bucket name must exactly match your domain name when using Route 53 Alias records for S3 website endpoints.

🔽 [Click to expand] — S3 Bucket Policy for Public Read
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadGetObject",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::www.example.com/*"
    }
  ]
}

Enable static website hosting via AWS CLI:

# Create the bucket (name must match your subdomain)
aws s3 mb s3://www.example.com --region us-east-1

# Enable static website hosting
aws s3 website s3://www.example.com \
  --index-document index.html \
  --error-document error.html

# Upload your fallback page
aws s3 cp ./maintenance/index.html s3://www.example.com/index.html \
  --content-type "text/html"

After enabling, your S3 website endpoint will follow the format:
http://www.example.com.s3-website-us-east-1.amazonaws.com

Step 2: Create a Route 53 Health Check for EC2

The health check is the trigger mechanism. Route 53 probes your EC2 instance and marks the PRIMARY record unhealthy when the probe fails consecutively beyond your threshold.

# Create health check targeting EC2 Elastic IP
aws route53 create-health-check \
  --caller-reference "ec2-primary-$(date +%s)" \
  --health-check-config '{
    "IPAddress": "203.0.113.10",
    "Port": 80,
    "Type": "HTTP",
    "ResourcePath": "/health",
    "FullyQualifiedDomainName": "www.example.com",
    "RequestInterval": 30,
    "FailureThreshold": 3
  }'

Key parameters explained:

  • RequestInterval: 10 (fast) or 30 (standard) seconds between probes. Fast interval incurs additional cost — check AWS pricing.
  • FailureThreshold: Number of consecutive failures before the endpoint is marked unhealthy (1–10). A value of 3 with a 30-second interval means ~90 seconds to failover.
  • ResourcePath: Your application's dedicated health endpoint — should return HTTP 2xx when healthy.

Step 3: Create Failover DNS Records in Route 53

You need exactly two records for the same name and type — one PRIMARY, one SECONDARY. Route 53 uses the Failover routing policy to manage which record is active.

graph TD subgraph Route53["Route 53 Hosted Zone: example.com"] REC_P["A Record: www.example.com SetIdentifier: Primary-EC2 Failover: PRIMARY TTL: 60 HealthCheckId: attached"] REC_S["A Record (Alias): www.example.com SetIdentifier: Secondary-S3 Failover: SECONDARY Target: S3 Website Endpoint"] end HC2["Route 53 Health Check HTTP → 203.0.113.10:80/health Interval: 30s | Threshold: 3"] EC2_P["EC2 Instance Elastic IP: 203.0.113.10"] S3_B["S3 Bucket: www.example.com Static Website Hosting Enabled"] HC2 -->|"monitors"| EC2_P HC2 -->|"status drives"| REC_P REC_P -->|"healthy: resolves to"| EC2_P REC_S -->|"failover: resolves to"| S3_B style REC_P fill:#d4edda,stroke:#28a745 style REC_S fill:#fff3cd,stroke:#ffc107 style HC2 fill:#e2d9f3,stroke:#6f42c1 style EC2_P fill:#cce5ff,stroke:#004085 style S3_B fill:#ffeeba,stroke:#856404
🔽 [Click to expand] — Full Route 53 Change Batch JSON (Primary + Secondary)
{
  "Comment": "Failover records for www.example.com",
  "Changes": [
    {
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "www.example.com",
        "Type": "A",
        "SetIdentifier": "Primary-EC2",
        "Failover": "PRIMARY",
        "TTL": 60,
        "ResourceRecords": [
          { "Value": "203.0.113.10" }
        ],
        "HealthCheckId": "YOUR_HEALTH_CHECK_ID"
      }
    },
    {
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "www.example.com",
        "Type": "A",
        "SetIdentifier": "Secondary-S3",
        "Failover": "SECONDARY",
        "AliasTarget": {
          "HostedZoneId": "Z3AQBSTGFYJSTF",
          "DNSName": "www.example.com.s3-website-us-east-1.amazonaws.com",
          "EvaluateTargetHealth": false
        }
      }
    }
  ]
}
# Apply the change batch
aws route53 change-resource-record-sets \
  --hosted-zone-id YOUR_HOSTED_ZONE_ID \
  --change-batch file://failover-records.json

Critical notes on the configuration:

  • Primary record: Uses a standard A record with the EC2 Elastic IP and a HealthCheckId attached. The health check association is what enables automatic failover.
  • Secondary record: Uses an Alias record pointing to the S3 website endpoint. Alias records for S3 website endpoints do not support health checks — EvaluateTargetHealth must be false for S3 website endpoints.
  • S3 Hosted Zone ID: The HostedZoneId in the Alias target is the S3 website endpoint's hosted zone ID, which is region-specific (e.g., Z3AQBSTGFYJSTF for us-east-1). Always verify the correct ID for your region in the AWS S3 endpoints documentation.
  • TTL: Set a low TTL (60 seconds) on the PRIMARY record to minimize DNS propagation delay during failover. Alias records managed by Route 53 do not use TTL in the same way.
  • No health check on SECONDARY: The secondary record intentionally has no health check. Route 53 always returns the secondary when the primary is unhealthy, regardless of the secondary's state.

Step 4: Set Up CloudWatch Alarm + SNS Notification (Optional but Recommended)

Silent failovers are dangerous. Wire up an SNS alert so your team knows the moment EC2 goes unhealthy.

# Create SNS topic for alerts
aws sns create-topic --name ec2-failover-alerts --region us-east-1

# Subscribe your email
aws sns subscribe \
  --topic-arn arn:aws:sns:us-east-1:123456789012:ec2-failover-alerts \
  --protocol email \
  --notification-endpoint oncall@example.com

# Create CloudWatch alarm on the Route 53 health check metric
# Note: Route 53 health check metrics are published to us-east-1 regardless of your region
aws cloudwatch put-metric-alarm \
  --alarm-name "EC2-HealthCheck-Failed" \
  --namespace "AWS/Route53" \
  --metric-name "HealthCheckStatus" \
  --dimensions Name=HealthCheckId,Value=YOUR_HEALTH_CHECK_ID \
  --statistic Minimum \
  --period 60 \
  --threshold 1 \
  --comparison-operator LessThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:ec2-failover-alerts \
  --region us-east-1

Important: Route 53 health check metrics (AWS/Route53 namespace) are always published to us-east-1, regardless of where your resources reside. Always create the CloudWatch alarm in us-east-1 for Route 53 health check metrics.

Step 5: Validate the Failover End-to-End

Never trust a failover configuration you haven't tested. Use the following sequence to simulate and verify:

sequenceDiagram participant Tester as 🧪 Tester participant EC2 as EC2 Instance participant HC as Route 53 Health Checker participant R53 as Route 53 DNS participant S3 as S3 Static Site participant User as 👤 End User Tester->>EC2: Stop instance (simulate failure) loop Every 30s × 3 failures HC->>EC2: HTTP GET /health EC2-->>HC: Connection refused / timeout end HC->>R53: Mark PRIMARY record UNHEALTHY User->>R53: DNS query: www.example.com R53-->>User: Return S3 website endpoint (SECONDARY) User->>S3: HTTP GET / S3-->>User: 200 OK — Maintenance page served Tester->>EC2: Start instance (recovery) loop Health restored HC->>EC2: HTTP GET /health EC2-->>HC: 200 OK end HC->>R53: Mark PRIMARY record HEALTHY User->>R53: DNS query: www.example.com R53-->>User: Return EC2 Elastic IP (PRIMARY restored)
# 1. Verify current DNS resolution (should return EC2 IP)
dig www.example.com +short

# 2. Check health check status via CLI
aws route53 get-health-check-status \
  --health-check-id YOUR_HEALTH_CHECK_ID \
  --query 'HealthCheckObservations[*].{Region:Region,Status:StatusReport.Status}' \
  --output table

# 3. Simulate failure: stop the EC2 instance or block port 80
aws ec2 stop-instances --instance-ids i-0123456789abcdef0

# 4. Poll DNS resolution every 30 seconds to observe the switch
watch -n 30 "dig www.example.com +short"

# 5. After failover confirmed, restart EC2 and verify failback
aws ec2 start-instances --instance-ids i-0123456789abcdef0

Failover Timing: What to Expect

PhaseDurationNotes
Health check detection~90 sec (30s interval × 3 failures)Faster with 10s interval (additional cost)
Route 53 DNS propagation~60 secControlled by your record TTL
Client DNS cache expiryUp to TTL valueClients may cache old IP; low TTL mitigates this
Total worst-case~3–5 minutesWith standard 30s interval and 60s TTL

Common Pitfalls

  • Bucket name mismatch: The S3 bucket name must exactly match the DNS record name for Alias routing to work.
  • Health check on wrong IP: Always use the Elastic IP, not the EC2 public IP — public IPs change on stop/start.
  • High TTL on primary record: A TTL of 300+ seconds means clients cache the old EC2 IP for up to 5 minutes after failover — keep it at 60 seconds or lower.
  • Security group blocking health checkers: Route 53 health checkers probe from specific IP ranges. Ensure your EC2 security group allows inbound traffic from Route 53 health checker IP ranges (published in the AWS IP ranges JSON file).
  • HTTPS health checks without valid cert: If using HTTPS health checks, the EC2 endpoint must have a valid SSL certificate, or the health check will fail even when the server is up.

IAM Permissions Required

Follow least privilege. The IAM entity managing this setup needs the following minimum permissions:

🔽 [Click to expand] — Minimum IAM Policy
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Route53FailoverManagement",
      "Effect": "Allow",
      "Action": [
        "route53:CreateHealthCheck",
        "route53:GetHealthCheck",
        "route53:GetHealthCheckStatus",
        "route53:ChangeResourceRecordSets",
        "route53:ListResourceRecordSets"
      ],
      "Resource": "*"
    },
    {
      "Sid": "S3WebsiteSetup",
      "Effect": "Allow",
      "Action": [
        "s3:CreateBucket",
        "s3:PutBucketWebsite",
        "s3:PutBucketPolicy",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::www.example.com",
        "arn:aws:s3:::www.example.com/*"
      ]
    },
    {
      "Sid": "CloudWatchAlarmSetup",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:PutMetricAlarm"
      ],
      "Resource": "arn:aws:cloudwatch:us-east-1:123456789012:alarm:EC2-HealthCheck-Failed"
    },
    {
      "Sid": "SNSTopicSetup",
      "Effect": "Allow",
      "Action": [
        "sns:CreateTopic",
        "sns:Subscribe"
      ],
      "Resource": "arn:aws:sns:us-east-1:123456789012:ec2-failover-alerts"
    }
  ]
}

Glossary

TermDefinition
Failover Routing PolicyA Route 53 routing policy that routes traffic to a primary resource when healthy, and to a secondary resource when the primary is unhealthy.
Health CheckA Route 53 mechanism that periodically sends requests to an endpoint and evaluates the response to determine if the resource is healthy.
Alias RecordA Route 53-specific DNS extension that maps a domain name to an AWS resource (like S3, CloudFront, or ELB) without incurring standard DNS query charges for the alias resolution.
Elastic IP (EIP)A static public IPv4 address allocated to your AWS account that can be associated with an EC2 instance, persisting across stop/start cycles.
TTL (Time to Live)The duration (in seconds) that DNS resolvers cache a DNS record before querying Route 53 again. Lower TTL = faster failover propagation.

Next Steps

  • For production workloads, consider replacing the S3 static fallback with a CloudFront + S3 combination to serve the backup page over HTTPS.
  • Explore Route 53 Application Recovery Controller for more advanced multi-region active-active or active-passive architectures with readiness checks.
  • Review the official documentation: Configuring DNS Failover — AWS Route 53 Developer Guide.

Comments

Popular posts from this blog

EC2 No Internet Access in Custom VPC: Attaching an Internet Gateway and Fixing Route Tables

IAM User vs. IAM Role: Why Your EC2 Instance Should Never Use a User

Lambda Infinite Loop with S3: How to Prevent Recursive Triggers