NAT Gateway vs NAT Instance: Which Should You Use for Private Subnet Internet Access?
When private EC2 instances need outbound internet access — to pull OS updates, download dependencies, or reach external APIs — you need a NAT solution in your VPC. The choice between a managed NAT Gateway and a self-managed NAT Instance on EC2 has real operational and cost implications that aren't obvious from the AWS console alone.
TL;DR: NAT Gateway vs NAT Instance
NAT Gateway is the recommended default for production workloads. NAT Instance is viable only when you need fine-grained traffic control, port forwarding, or want to minimize cost on low-traffic dev environments.
| Dimension | NAT Gateway | NAT Instance |
|---|---|---|
| Management overhead | None (fully managed) | High (patching, HA, failover) |
| High availability | Built-in per AZ | Manual (Auto Scaling + scripts) |
| Bandwidth | Scales automatically | Limited by instance type |
| Security Groups | Not supported | Supported |
| Port forwarding | Not supported | Supported via iptables |
| Cost model | Hourly + per-GB data | EC2 instance hours only |
| Source/dest check | Handled automatically | Must be disabled manually |
How NAT Works in a VPC: The Core Model
Both solutions perform the same fundamental job: they translate the private IP of an outbound packet to a public IP so the internet can route a response back. The difference is who manages that translation layer.
In a standard VPC layout, private subnets have no route to an Internet Gateway. A NAT device sits in a public subnet, holds an Elastic IP, and acts as the exit point. The private subnet's route table sends 0.0.0.0/0 traffic to the NAT device, which forwards it to the Internet Gateway.
10.0.1.x"] RouteTable["Route Table
0.0.0.0/0 → NAT"] NATDevice["NAT Device
Public Subnet (EIP)"] IGW["Internet Gateway"] Internet["Internet"] PrivateInstance -->|"outbound packet"| RouteTable RouteTable --> NATDevice NATDevice -->|"src IP rewritten to EIP"| IGW IGW --> Internet Internet -->|"return traffic"| IGW IGW --> NATDevice NATDevice -->|"translated back to 10.0.1.x"| PrivateInstance
- Private Instance initiates an outbound request (e.g.,
yum update). - The private subnet route table forwards
0.0.0.0/0to the NAT device. - The NAT device (Gateway or Instance) translates the source IP to its Elastic IP and forwards to the Internet Gateway.
- The Internet Gateway routes the packet to the internet.
- Return traffic follows the reverse path — the NAT device maps the response back to the originating private IP.
NAT Gateway: The Managed Path
NAT Gateway is an AWS-managed service. You provision it into a public subnet, associate an Elastic IP, and update your route table. AWS handles availability, patching, and scaling. There is no OS to manage, no source/destination check to disable, and no Security Group to configure on the gateway itself.
Provisioning a NAT Gateway (CLI)
# Step 1: Allocate an Elastic IP
aws ec2 allocate-address \
--domain vpc \
--region us-east-1
# Step 2: Create the NAT Gateway in your public subnet
aws ec2 create-nat-gateway \
--subnet-id subnet-0abc12345def67890 \
--allocation-id eipalloc-0abc12345def67890 \
--region us-east-1
# Step 3: Wait for the gateway to become available
aws ec2 wait nat-gateway-available \
--nat-gateway-ids nat-0abc12345def67890 \
--region us-east-1
# Step 4: Update the private subnet route table
aws ec2 create-route \
--route-table-id rtb-0abc12345def67890 \
--destination-cidr-block 0.0.0.0/0 \
--nat-gateway-id nat-0abc12345def67890 \
--region us-east-1
One architectural point that catches teams off guard: a single NAT Gateway covers only one Availability Zone. For multi-AZ resilience, you must deploy one NAT Gateway per AZ and configure each AZ's private subnet route table to point to its local NAT Gateway. Routing all AZs through a single NAT Gateway introduces a cross-AZ dependency — if that AZ degrades, all private subnets lose outbound access simultaneously.
Think of it like a building's fire exit: one exit per floor is safer than routing everyone to the ground floor exit. Cross-AZ NAT traffic also incurs data transfer charges.
NAT Instance: The Self-Managed Path
A NAT Instance is a standard EC2 instance running a NAT-capable AMI, placed in a public subnet. AWS provides community AMIs prefixed with amzn-ami-vpc-nat, but you are responsible for selecting, launching, patching, and ensuring availability of this instance.
The single most common failure mode: engineers launch the NAT Instance and update the route table, but forget to disable the source/destination check. By default, EC2 drops packets where the instance is neither the source nor the destination. A NAT instance is explicitly forwarding other instances' traffic — so this check must be disabled.
Launching and Configuring a NAT Instance (CLI)
🔽 Click to expand: Full NAT Instance setup
# Step 1: Launch the NAT instance into a public subnet
# Use an AMI ID from the amzn-ami-vpc-nat family for your region
aws ec2 run-instances \
--image-id ami-0123456789abcdef0 \
--instance-type t3.small \
--subnet-id subnet-0abc12345def67890 \
--associate-public-ip-address \
--key-name my-key-pair \
--region us-east-1
# Step 2: Disable source/destination check — REQUIRED for NAT to function
aws ec2 modify-instance-attribute \
--instance-id i-0abc12345def67890 \
--source-dest-check "Value=false" \
--region us-east-1
# Step 3: Verify the source/dest check is disabled
aws ec2 describe-instance-attribute \
--instance-id i-0abc12345def67890 \
--attribute sourceDestCheck \
--region us-east-1
# Step 4: Update the private subnet route table to point to the NAT instance
aws ec2 create-route \
--route-table-id rtb-0abc12345def67890 \
--destination-cidr-block 0.0.0.0/0 \
--instance-id i-0abc12345def67890 \
--region us-east-1
Security Group Configuration for the NAT Instance
Unlike NAT Gateway, a NAT Instance is a real EC2 instance with a Security Group. You must explicitly allow inbound traffic from your private subnets and allow all outbound traffic.
# Create a security group for the NAT instance
aws ec2 create-security-group \
--group-name nat-instance-sg \
--description "Security group for NAT instance" \
--vpc-id vpc-0abc12345def67890 \
--region us-east-1
# Allow inbound HTTP from private subnet CIDR
aws ec2 authorize-security-group-ingress \
--group-id sg-0abc12345def67890 \
--protocol tcp \
--from-port 80 \
--to-port 80 \
--cidr 10.0.0.0/16 \
--region us-east-1
# Allow inbound HTTPS from private subnet CIDR
aws ec2 authorize-security-group-ingress \
--group-id sg-0abc12345def67890 \
--protocol tcp \
--from-port 443 \
--to-port 443 \
--cidr 10.0.0.0/16 \
--region us-east-1
# Allow all outbound traffic
aws ec2 authorize-security-group-egress \
--group-id sg-0abc12345def67890 \
--protocol -1 \
--cidr 0.0.0.0/0 \
--region us-east-1
The Security Group on a NAT Instance is also its biggest operational advantage over NAT Gateway: you can restrict which protocols and ports private instances are allowed to use for outbound traffic, something NAT Gateway cannot enforce at the gateway level.
Decision Guide: Which NAT Solution Fits Your Scenario?
from private subnet?"]) Prod{"Production workload
or HA required?"} PortFwd{"Need port forwarding
or SG-level filtering?"} CostSensitive{"Strict cost constraint
low-traffic dev env?"} UseNATGW["✅ Use NAT Gateway
Deploy one per AZ"] UseNATInst["⚙️ Use NAT Instance
Disable src/dst check"] Start --> Prod Prod -->|"Yes"| UseNATGW Prod -->|"No"| PortFwd PortFwd -->|"Yes"| UseNATInst PortFwd -->|"No"| CostSensitive CostSensitive -->|"Yes"| UseNATInst CostSensitive -->|"No"| UseNATGW
- Start by assessing whether this is a production workload requiring high availability.
- If yes, NAT Gateway is the operationally sound choice — managed HA, no patching burden.
- If cost is the primary constraint on a low-traffic dev environment, a NAT Instance on a small instance type may be cheaper.
- If you need port forwarding or gateway-level traffic filtering, only a NAT Instance supports those capabilities.
Experience Signal: The Silent Routing Failure
Symptom: Private instances cannot reach the internet after NAT Instance setup. curl https://example.com hangs indefinitely. No errors in the application logs.
Misdiagnosis: Most engineers immediately check the route table and confirm the 0.0.0.0/0 route points to the NAT instance. The route looks correct. They then check the NAT instance's Security Group and confirm port 443 is open. Still no resolution.
Actual cause: The source/destination check was never disabled on the NAT instance. The instance is silently dropping forwarded packets because EC2's virtualization layer rejects traffic where the instance is neither the source nor the destination IP. The route table is correct. The Security Group is correct. The packets are being dropped at the hypervisor level before the NAT instance's OS ever sees them.
Fix: Run describe-instance-attribute --attribute sourceDestCheck as shown above. If the value is true, disable it with modify-instance-attribute --source-dest-check "Value=false". Traffic will flow immediately — no reboot required.
Depth Signal: NAT Gateway and Asymmetric Routing
A non-obvious behavior: if a private instance sends traffic through a NAT Gateway in AZ-A, but the return traffic is routed through a different path (for example, due to a misconfigured Transit Gateway or VPC peering route), the NAT Gateway will drop the return packets. NAT Gateway maintains connection state per flow, and asymmetric routing breaks that state table. This manifests as TCP connections that establish but immediately stall — the SYN-ACK never completes. The fix is always to ensure symmetric routing: the same NAT Gateway that handled the outbound flow must also receive the inbound return traffic for that connection.
IAM Permissions for NAT Management Operations
Engineers managing NAT resources via CLI or automation need the following minimum permissions. Apply these to a role, not directly to a user.
🔽 Click to expand: IAM policy for NAT management
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "NATGatewayManagement",
"Effect": "Allow",
"Action": [
"ec2:CreateNatGateway",
"ec2:DeleteNatGateway",
"ec2:DescribeNatGateways",
"ec2:AllocateAddress",
"ec2:ReleaseAddress",
"ec2:DescribeAddresses"
],
"Resource": "*"
},
{
"Sid": "NATInstanceManagement",
"Effect": "Allow",
"Action": [
"ec2:ModifyInstanceAttribute",
"ec2:DescribeInstanceAttribute",
"ec2:CreateRoute",
"ec2:ReplaceRoute",
"ec2:DescribeRouteTables"
],
"Resource": "*"
}
]
}
Note: DescribeNatGateways, DescribeAddresses, DescribeInstanceAttribute, and DescribeRouteTables are read/list actions that require "Resource": "*" — resource-level restrictions are not supported for these actions per the AWS Service Authorization Reference.
Wrap-Up: NAT Gateway vs NAT Instance — Making the Right Call
For the vast majority of production workloads, NAT Gateway is the correct answer. It eliminates operational burden, scales automatically, and provides built-in availability within an AZ. The cost premium over a NAT Instance is justified by the elimination of patching, failover scripting, and the operational risk of a misconfigured source/destination check.
Reserve NAT Instance for specific scenarios: strict cost constraints on low-traffic environments, requirements for port forwarding, or cases where you need Security Group-level control over outbound traffic at the gateway itself.
Whichever path you choose, deploy one NAT resource per Availability Zone and configure each AZ's private route table independently. Single-AZ NAT is a hidden availability risk that only surfaces during an incident.
Glossary
| Term | Definition |
|---|---|
| NAT (Network Address Translation) | The process of rewriting a packet's source IP address so that return traffic can be routed back through the same device. |
| Source/Destination Check | An EC2 hypervisor-level control that drops packets where the instance is neither the source nor the destination. Must be disabled on NAT Instances. |
| Elastic IP (EIP) | A static public IPv4 address allocated to your AWS account, associated with a NAT device to provide a consistent public exit point. |
| Route Table | A VPC construct that defines where subnet traffic is directed. Private subnets must have a 0.0.0.0/0 route pointing to the NAT device. |
| Internet Gateway (IGW) | The VPC component that enables communication between the VPC and the internet. NAT devices forward traffic to the IGW, not directly to the internet. |
Comments
Post a Comment