DNS Failover with Route 53: Automatically Reroute Traffic to S3 When Your EC2 Goes Down
Your primary EC2 instance just crashed at 2 AM — without an automated failover strategy, your users hit a dead end until you wake up and intervene. This guide walks you through configuring Route 53 Active-Passive DNS failover so that traffic automatically reroutes to a static backup site on S3 the moment your primary server becomes unhealthy.
TL;DR
| Component | Role | Type |
|---|---|---|
| Route 53 Health Check | Monitors EC2 endpoint health | HTTP/HTTPS/TCP probe |
| Primary DNS Record | Points to EC2 (Elastic IP) | Failover: PRIMARY |
| Secondary DNS Record | Points to S3 static website | Failover: SECONDARY |
| S3 Static Website | Serves backup/maintenance page | Passive standby |
| SNS (optional) | Alerts on health check failure | Notification layer |
Architecture Overview
Before diving into configuration, understand the data flow. Route 53 continuously probes your EC2 endpoint. When the health check fails, Route 53 stops resolving DNS to the PRIMARY record and automatically serves the SECONDARY record pointing to S3.
- User DNS Query: The client queries Route 53 for your domain (e.g.,
www.example.com). - Health Check Probe: Route 53 health checkers (from multiple global locations) continuously poll your EC2 endpoint on the configured protocol/port/path.
- Healthy State: Route 53 resolves the domain to the EC2 Elastic IP — normal traffic flow.
- Unhealthy State: After the threshold of consecutive failures is met, Route 53 marks the PRIMARY record unhealthy and stops returning it in DNS responses.
- Failover Activated: Route 53 automatically returns the SECONDARY record — the S3 static website endpoint.
- SNS Alert: A CloudWatch alarm tied to the health check metric triggers an SNS notification to your on-call team.
Analogy: Think of Route 53 failover like a hospital's triage system. The primary ER (EC2) handles all patients normally. If the ER goes offline, an automated redirect sends patients to the urgent care clinic (S3). The clinic can't perform surgery, but it keeps patients informed and safe until the ER is back online.
Prerequisites
- A registered domain managed in Route 53 (or a hosted zone with NS records delegated to Route 53).
- A running EC2 instance with an Elastic IP assigned and a publicly accessible health check endpoint (e.g.,
/health). - An S3 bucket configured for static website hosting with a maintenance/fallback
index.html.
Step 1: Configure S3 as a Static Website
The S3 bucket name must exactly match your domain name when using Route 53 Alias records for S3 website endpoints.
🔽 [Click to expand] — S3 Bucket Policy for Public Read
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::www.example.com/*"
}
]
}
Enable static website hosting via AWS CLI:
# Create the bucket (name must match your subdomain)
aws s3 mb s3://www.example.com --region us-east-1
# Enable static website hosting
aws s3 website s3://www.example.com \
--index-document index.html \
--error-document error.html
# Upload your fallback page
aws s3 cp ./maintenance/index.html s3://www.example.com/index.html \
--content-type "text/html"
After enabling, your S3 website endpoint will follow the format:
http://www.example.com.s3-website-us-east-1.amazonaws.com
Step 2: Create a Route 53 Health Check for EC2
The health check is the trigger mechanism. Route 53 probes your EC2 instance and marks the PRIMARY record unhealthy when the probe fails consecutively beyond your threshold.
# Create health check targeting EC2 Elastic IP
aws route53 create-health-check \
--caller-reference "ec2-primary-$(date +%s)" \
--health-check-config '{
"IPAddress": "203.0.113.10",
"Port": 80,
"Type": "HTTP",
"ResourcePath": "/health",
"FullyQualifiedDomainName": "www.example.com",
"RequestInterval": 30,
"FailureThreshold": 3
}'
Key parameters explained:
RequestInterval:10(fast) or30(standard) seconds between probes. Fast interval incurs additional cost — check AWS pricing.FailureThreshold: Number of consecutive failures before the endpoint is marked unhealthy (1–10). A value of3with a 30-second interval means ~90 seconds to failover.ResourcePath: Your application's dedicated health endpoint — should return HTTP 2xx when healthy.
Step 3: Create Failover DNS Records in Route 53
You need exactly two records for the same name and type — one PRIMARY, one SECONDARY. Route 53 uses the Failover routing policy to manage which record is active.
🔽 [Click to expand] — Full Route 53 Change Batch JSON (Primary + Secondary)
{
"Comment": "Failover records for www.example.com",
"Changes": [
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "www.example.com",
"Type": "A",
"SetIdentifier": "Primary-EC2",
"Failover": "PRIMARY",
"TTL": 60,
"ResourceRecords": [
{ "Value": "203.0.113.10" }
],
"HealthCheckId": "YOUR_HEALTH_CHECK_ID"
}
},
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "www.example.com",
"Type": "A",
"SetIdentifier": "Secondary-S3",
"Failover": "SECONDARY",
"AliasTarget": {
"HostedZoneId": "Z3AQBSTGFYJSTF",
"DNSName": "www.example.com.s3-website-us-east-1.amazonaws.com",
"EvaluateTargetHealth": false
}
}
}
]
}
# Apply the change batch
aws route53 change-resource-record-sets \
--hosted-zone-id YOUR_HOSTED_ZONE_ID \
--change-batch file://failover-records.json
Critical notes on the configuration:
- Primary record: Uses a standard
Arecord with the EC2 Elastic IP and aHealthCheckIdattached. The health check association is what enables automatic failover. - Secondary record: Uses an Alias record pointing to the S3 website endpoint. Alias records for S3 website endpoints do not support health checks —
EvaluateTargetHealthmust befalsefor S3 website endpoints. - S3 Hosted Zone ID: The
HostedZoneIdin the Alias target is the S3 website endpoint's hosted zone ID, which is region-specific (e.g.,Z3AQBSTGFYJSTFforus-east-1). Always verify the correct ID for your region in the AWS S3 endpoints documentation. - TTL: Set a low TTL (60 seconds) on the PRIMARY record to minimize DNS propagation delay during failover. Alias records managed by Route 53 do not use TTL in the same way.
- No health check on SECONDARY: The secondary record intentionally has no health check. Route 53 always returns the secondary when the primary is unhealthy, regardless of the secondary's state.
Step 4: Set Up CloudWatch Alarm + SNS Notification (Optional but Recommended)
Silent failovers are dangerous. Wire up an SNS alert so your team knows the moment EC2 goes unhealthy.
# Create SNS topic for alerts
aws sns create-topic --name ec2-failover-alerts --region us-east-1
# Subscribe your email
aws sns subscribe \
--topic-arn arn:aws:sns:us-east-1:123456789012:ec2-failover-alerts \
--protocol email \
--notification-endpoint oncall@example.com
# Create CloudWatch alarm on the Route 53 health check metric
# Note: Route 53 health check metrics are published to us-east-1 regardless of your region
aws cloudwatch put-metric-alarm \
--alarm-name "EC2-HealthCheck-Failed" \
--namespace "AWS/Route53" \
--metric-name "HealthCheckStatus" \
--dimensions Name=HealthCheckId,Value=YOUR_HEALTH_CHECK_ID \
--statistic Minimum \
--period 60 \
--threshold 1 \
--comparison-operator LessThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:ec2-failover-alerts \
--region us-east-1
Important: Route 53 health check metrics (AWS/Route53 namespace) are always published to us-east-1, regardless of where your resources reside. Always create the CloudWatch alarm in us-east-1 for Route 53 health check metrics.
Step 5: Validate the Failover End-to-End
Never trust a failover configuration you haven't tested. Use the following sequence to simulate and verify:
# 1. Verify current DNS resolution (should return EC2 IP)
dig www.example.com +short
# 2. Check health check status via CLI
aws route53 get-health-check-status \
--health-check-id YOUR_HEALTH_CHECK_ID \
--query 'HealthCheckObservations[*].{Region:Region,Status:StatusReport.Status}' \
--output table
# 3. Simulate failure: stop the EC2 instance or block port 80
aws ec2 stop-instances --instance-ids i-0123456789abcdef0
# 4. Poll DNS resolution every 30 seconds to observe the switch
watch -n 30 "dig www.example.com +short"
# 5. After failover confirmed, restart EC2 and verify failback
aws ec2 start-instances --instance-ids i-0123456789abcdef0
Failover Timing: What to Expect
| Phase | Duration | Notes |
|---|---|---|
| Health check detection | ~90 sec (30s interval × 3 failures) | Faster with 10s interval (additional cost) |
| Route 53 DNS propagation | ~60 sec | Controlled by your record TTL |
| Client DNS cache expiry | Up to TTL value | Clients may cache old IP; low TTL mitigates this |
| Total worst-case | ~3–5 minutes | With standard 30s interval and 60s TTL |
Common Pitfalls
- Bucket name mismatch: The S3 bucket name must exactly match the DNS record name for Alias routing to work.
- Health check on wrong IP: Always use the Elastic IP, not the EC2 public IP — public IPs change on stop/start.
- High TTL on primary record: A TTL of 300+ seconds means clients cache the old EC2 IP for up to 5 minutes after failover — keep it at 60 seconds or lower.
- Security group blocking health checkers: Route 53 health checkers probe from specific IP ranges. Ensure your EC2 security group allows inbound traffic from Route 53 health checker IP ranges (published in the AWS IP ranges JSON file).
- HTTPS health checks without valid cert: If using HTTPS health checks, the EC2 endpoint must have a valid SSL certificate, or the health check will fail even when the server is up.
IAM Permissions Required
Follow least privilege. The IAM entity managing this setup needs the following minimum permissions:
🔽 [Click to expand] — Minimum IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Route53FailoverManagement",
"Effect": "Allow",
"Action": [
"route53:CreateHealthCheck",
"route53:GetHealthCheck",
"route53:GetHealthCheckStatus",
"route53:ChangeResourceRecordSets",
"route53:ListResourceRecordSets"
],
"Resource": "*"
},
{
"Sid": "S3WebsiteSetup",
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:PutBucketWebsite",
"s3:PutBucketPolicy",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::www.example.com",
"arn:aws:s3:::www.example.com/*"
]
},
{
"Sid": "CloudWatchAlarmSetup",
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricAlarm"
],
"Resource": "arn:aws:cloudwatch:us-east-1:123456789012:alarm:EC2-HealthCheck-Failed"
},
{
"Sid": "SNSTopicSetup",
"Effect": "Allow",
"Action": [
"sns:CreateTopic",
"sns:Subscribe"
],
"Resource": "arn:aws:sns:us-east-1:123456789012:ec2-failover-alerts"
}
]
}
Glossary
| Term | Definition |
|---|---|
| Failover Routing Policy | A Route 53 routing policy that routes traffic to a primary resource when healthy, and to a secondary resource when the primary is unhealthy. |
| Health Check | A Route 53 mechanism that periodically sends requests to an endpoint and evaluates the response to determine if the resource is healthy. |
| Alias Record | A Route 53-specific DNS extension that maps a domain name to an AWS resource (like S3, CloudFront, or ELB) without incurring standard DNS query charges for the alias resolution. |
| Elastic IP (EIP) | A static public IPv4 address allocated to your AWS account that can be associated with an EC2 instance, persisting across stop/start cycles. |
| TTL (Time to Live) | The duration (in seconds) that DNS resolvers cache a DNS record before querying Route 53 again. Lower TTL = faster failover propagation. |
Next Steps
- For production workloads, consider replacing the S3 static fallback with a CloudFront + S3 combination to serve the backup page over HTTPS.
- Explore Route 53 Application Recovery Controller for more advanced multi-region active-active or active-passive architectures with readiness checks.
- Review the official documentation: Configuring DNS Failover — AWS Route 53 Developer Guide.
Comments
Post a Comment