NAT Gateway vs NAT Instance: Choosing the Right Outbound Internet Path for Private Subnets
Your private EC2 instances need to pull OS updates and patches from the internet, but they must never be directly reachable from it — this is the classic egress-only problem in AWS VPC design, and your choice of NAT solution has real consequences for cost, operational burden, and throughput.
TL;DR — Quick Comparison
| Dimension | NAT Gateway (Managed) | NAT Instance (Self-Managed EC2) |
|---|---|---|
| Management | Fully managed by AWS | You own patching, HA, failover |
| Availability | Built-in redundancy within AZ | Single EC2 — SPOF unless you build HA |
| Bandwidth | Scales automatically (check AWS docs for current limits) | Capped by EC2 instance type network performance |
| Security Groups | Not supported (uses NACLs only) | Fully supported |
| Port Forwarding / Bastion Hybrid | Not possible | Possible via iptables rules |
| Cost Model | Hourly + per-GB data processing fee | EC2 instance hours + data transfer |
| Source/Dest Check | Handled automatically | Must be manually disabled |
| Operational Overhead | Near-zero | High — AMI updates, monitoring, failover scripts |
How NAT Works: The Core Mechanism
Both solutions implement Network Address Translation (NAT) — specifically IP masquerading. A private instance (e.g., 10.0.1.50) sends a packet destined for 185.125.190.36 (Ubuntu mirrors). The NAT device rewrites the source IP to its own public/Elastic IP, forwards the packet, and maintains a connection tracking table to route the response back to the originating private instance. The internet never sees the private RFC-1918 address.
Traffic Flow Architecture
10.0.1.50"] NATDevice["NAT Device
(Gateway or Instance)
Public Subnet"] IGW["Internet Gateway
(IGW)"] Internet["Internet
e.g. Ubuntu Mirrors"] PrivateInstance -->|"src: 10.0.1.50
dst: 185.125.190.36"| NATDevice NATDevice -->|"src: 54.x.x.x (EIP)
dst: 185.125.190.36"| IGW IGW --> Internet Internet -->|"Response"| IGW IGW --> NATDevice NATDevice -->|"Connection tracking
translates back to 10.0.1.50"| PrivateInstance
- Private Instance initiates an outbound request (e.g.,
apt-get update). Its route table has a default route (0.0.0.0/0) pointing to the NAT device. - NAT Device (Gateway or Instance) sits in a public subnet. It rewrites the source IP to its Elastic IP and forwards the packet.
- Internet Gateway (IGW) is attached to the VPC and handles the actual egress to the public internet. The NAT device's public subnet route table points
0.0.0.0/0to the IGW. - Response packets return to the IGW, are forwarded to the NAT device, which uses its connection tracking table to translate the destination back to the private instance IP.
- The private instance receives the response — it never had a public IP, and the internet never knew its real address.
Deep Dive: NAT Gateway
NAT Gateway is an AWS-managed, horizontally scaled NAT service. You provision it into a public subnet and associate an Elastic IP. AWS handles all underlying infrastructure, software patching, and within-AZ redundancy.
Key Behavioral Facts
- AZ-scoped: A NAT Gateway is deployed in a single AZ. For true HA, deploy one NAT Gateway per AZ and configure each AZ's private subnet route table to use the NAT Gateway in the same AZ. This avoids cross-AZ data transfer charges and eliminates AZ-level SPOF.
- No Security Groups: You control access via Network ACLs on the subnets. The NAT Gateway itself cannot have a security group attached.
- Connection tracking: NAT Gateway tracks connections and supports TCP, UDP, and ICMP.
- Private NAT Gateway: AWS also offers a private NAT Gateway variant (no EIP, no IGW required) for routing between overlapping CIDRs or to Transit Gateway — useful in complex multi-VPC architectures.
Provisioning via AWS CLI
# 1. Allocate an Elastic IP
aws ec2 allocate-address --domain vpc
# 2. Create the NAT Gateway in your public subnet
aws ec2 create-nat-gateway \
--subnet-id subnet-0abc12345def67890 \
--allocation-id eipalloc-0abc12345def67890
# 3. Wait for it to become available
aws ec2 wait nat-gateway-available \
--nat-gateway-ids nat-0abc12345def67890
# 4. Update the private subnet route table
aws ec2 create-route \
--route-table-id rtb-0abc12345def67890 \
--destination-cidr-block 0.0.0.0/0 \
--nat-gateway-id nat-0abc12345def67890
Deep Dive: NAT Instance
A NAT Instance is a standard EC2 instance configured to perform IP masquerading using the Linux kernel's iptables NAT table. You are responsible for the AMI, the iptables rules, persistence across reboots, and high availability.
Critical Configuration Requirements
- Disable Source/Destination Check: By default, EC2 drops packets where it is neither the source nor the destination. For NAT to work, this check must be disabled on the NAT instance's network interface.
- Place in Public Subnet: The instance needs a public IP or Elastic IP and must reside in a subnet with an IGW route.
- Security Group: Allow inbound traffic from the private subnet CIDR on relevant ports (e.g., 80, 443) and allow outbound to the internet.
- Route Table: Private subnet's default route must point to the NAT instance's ENI (not the instance ID — use the ENI ID for reliability).
User Data Bootstrap Script (Amazon Linux 2)
🔽 Click to expand — NAT Instance User Data Script
#!/bin/bash
# -----------------------------------------------
# NAT Instance Bootstrap — Amazon Linux 2
# -----------------------------------------------
# 1. Enable IP forwarding persistently
echo "net.ipv4.ip_forward = 1" | sudo tee /etc/sysctl.d/30-ipforward.conf
sudo sysctl -p /etc/sysctl.d/30-ipforward.conf
# 2. Install iptables persistence service
# (Amazon Linux 2 does not include iptables-services by default)
sudo yum install -y iptables-services
# 3. Set up NAT masquerade rule on the primary interface
# eth0 is the outbound interface facing the IGW
sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
# 4. Save the current iptables rules to /etc/sysconfig/iptables
# so they are restored on reboot by the iptables service
sudo service iptables save
# 5. Enable and start the iptables service for persistence
sudo systemctl enable iptables
sudo systemctl start iptables
Disable Source/Destination Check (CLI)
# Disable source/dest check on the NAT instance's primary ENI
aws ec2 modify-instance-attribute \
--instance-id i-0abc12345def67890 \
--no-source-dest-check
Security Group for NAT Instance (Least Privilege)
🔽 Click to expand — Security Group Rules
# Inbound: Allow HTTP/HTTPS from private subnet CIDR only
aws ec2 authorize-security-group-ingress \
--group-id sg-0abc12345def67890 \
--protocol tcp --port 80 \
--cidr 10.0.0.0/16
aws ec2 authorize-security-group-ingress \
--group-id sg-0abc12345def67890 \
--protocol tcp --port 443 \
--cidr 10.0.0.0/16
# Outbound: Allow all outbound (NAT needs to forward to internet)
aws ec2 authorize-security-group-egress \
--group-id sg-0abc12345def67890 \
--protocol -1 --port -1 \
--cidr 0.0.0.0/0
High Availability: The NAT Instance Problem
(AZ-A only)"] IGW_Bad["IGW"] AZ_A_Private_Bad --> NAT_Single AZ_B_Private_Bad -->|"Cross-AZ traffic"| NAT_Single NAT_Single --> IGW_Bad end subgraph HA ["✅ HA NAT — One Per AZ"] AZ_A_Private_Good["AZ-A Private Subnet"] AZ_B_Private_Good["AZ-B Private Subnet"] NAT_A["NAT Device
(AZ-A)"] NAT_B["NAT Device
(AZ-B)"] IGW_Good["IGW"] AZ_A_Private_Good --> NAT_A AZ_B_Private_Good --> NAT_B NAT_A --> IGW_Good NAT_B --> IGW_Good end
- Single NAT Instance (left): One EC2 in AZ-A serves both AZ-A and AZ-B private subnets. If the instance or AZ-A fails, all outbound traffic from both AZs is broken. This is a hard SPOF.
- HA NAT Instance (right): One NAT Instance per AZ, each serving only its local private subnet. Requires a health-check Lambda or Auto Scaling Group with a lifecycle hook to update route tables on failure — significant operational complexity you must build and maintain yourself.
- NAT Gateway (implicit): Deploy one per AZ. AWS manages redundancy within each AZ automatically. Your only responsibility is the per-AZ route table configuration.
Analogy: A NAT Gateway is like a managed toll booth operated by the highway authority — it scales automatically, is always staffed, and you just pay per use. A NAT Instance is like building your own toll booth: you buy the materials, hire the staff, handle sick days, and rebuild it if a truck crashes into it. For most teams, the managed booth is the right call unless you have very specific customization needs that justify the operational cost.
Decision Framework
outbound internet access"]) Q1{"Need custom iptables,
port forwarding, or
bastion hybrid?"} Q2{"Extremely cost-constrained
with very low traffic?"} Q3{"Production workload
requiring HA?"} NATGateway["✅ Use NAT Gateway
(Managed, per-AZ)"] NATInstance["⚙️ Use NAT Instance
(Self-managed EC2)"] Start --> Q1 Q1 -->|"Yes"| NATInstance Q1 -->|"No"| Q2 Q2 -->|"Yes"| NATInstance Q2 -->|"No"| Q3 Q3 -->|"Yes"| NATGateway Q3 -->|"No"| NATGateway
Cost Considerations
Pricing and exact limits vary by region and change over time — always verify current figures in the official AWS VPC pricing page. The general model:
- NAT Gateway: Charged per hour the gateway exists plus a per-GB data processing fee. For high-throughput workloads (e.g., large software downloads), the data processing charge can accumulate significantly.
- NAT Instance: Charged at standard EC2 On-Demand or Reserved rates. No per-GB processing fee from AWS, but you pay for the instance even when idle. A Reserved Instance can reduce cost for steady-state workloads.
- Cross-AZ traffic: If your NAT device is in a different AZ than the originating private instance, you incur cross-AZ data transfer charges. This is a strong reason to deploy NAT resources per-AZ.
When to Choose Each
| Choose NAT Gateway when… | Choose NAT Instance when… |
|---|---|
| You want zero operational overhead | You need port forwarding or custom iptables rules |
| Production workloads require high availability | You need a combined NAT + Bastion host to reduce costs |
| Your team lacks Linux networking expertise | Budget is extremely constrained and traffic volume is low |
| You need automatic bandwidth scaling | You require deep packet inspection or custom routing logic |
Wrap-Up & Next Steps
For the vast majority of production workloads — including private instances downloading OS updates — NAT Gateway is the correct default choice. The operational simplicity, built-in availability, and automatic scaling outweigh the per-GB cost for most use cases. Reserve NAT Instances for niche scenarios where custom network behavior is a hard requirement.
- 📖 AWS Docs: NAT Gateways
- 📖 AWS Docs: NAT Instances
- 📖 AWS VPC Pricing
- 🔒 Apply least-privilege IAM policies when using Lambda or SSM automation to manage NAT Instance route table failover.
Glossary
| Term | Definition |
|---|---|
| NAT (Network Address Translation) | Rewrites packet source/destination IPs to allow private hosts to communicate with public networks without exposing their private addresses. |
| Elastic IP (EIP) | A static public IPv4 address allocated to your AWS account, associated with a NAT Gateway or EC2 instance for consistent public addressing. |
| Source/Destination Check | An EC2 network interface attribute that drops packets where the instance is not the source or destination. Must be disabled for NAT Instances. |
| IP Masquerading | A form of NAT where multiple private IPs share a single public IP; the NAT device tracks connections to demultiplex return traffic. |
| Internet Gateway (IGW) | A horizontally scaled, redundant VPC component that enables communication between instances in a VPC and the internet. Required for both NAT solutions. |
Comments
Post a Comment