Designing Multi-AZ Networks
A single data center can fail. Power, cooling, networking, or a bad software push can take it offline, and if all of your servers live in that one place, your application goes down with it. AWS solves this by giving you Availability Zones (AZs) — separate, isolated data centers within the same Region that are close enough for fast networking but far enough apart to fail independently. This page shows you how to spread a VPC (Virtual Private Cloud — your own private network inside AWS) across multiple AZs so a single failure never takes your whole system down.
Why spread across Availability Zones
Putting everything in one AZ creates a large blast radius — the amount of your system that breaks when one thing fails. If that AZ has an outage, 100% of your servers are gone. If you split your servers evenly across three AZs, a single AZ outage takes out only about a third of your capacity, and load balancers and auto scaling can route around the failure.
AWS designs AZs to be physically isolated (different buildings, power, and cooling), so they almost never fail at the same time. Most AWS Regions offer at least three AZs. For production workloads, use two AZs at a minimum and three when you can afford it — three AZs let you survive one failure and still have two healthy zones sharing the load.
Tip: Many AWS managed services (RDS Multi-AZ, ELB, EKS) require or strongly prefer subnets in at least two AZs. Designing for multi-AZ from day one saves a painful re-architecture later.
The reference layout
A clean, resilient VPC uses paired subnets per AZ: one public subnet and one private subnet in each AZ. Here is a layout for a 10.0.0.0/16 VPC across three AZs.
| AZ | Public subnet | Private subnet | NAT Gateway | Route table |
|---|---|---|---|---|
| us-east-1a | 10.0.0.0/24 | 10.0.10.0/24 | nat-0a1b2c3d (in 1a) | rtb-private-1a |
| us-east-1b | 10.0.1.0/24 | 10.0.11.0/24 | nat-0e4f5g6h (in 1b) | rtb-private-1b |
| us-east-1c | 10.0.2.0/24 | 10.0.12.0/24 | nat-0i7j8k9l (in 1c) | rtb-private-1c |
A public subnet has a route to an Internet Gateway (IGW — the VPC component that connects your network to the public internet), so resources there can be reached from the internet. A private subnet has no direct internet route; its instances reach out through a NAT Gateway. A NAT Gateway (Network Address Translation Gateway) lets private instances start outbound connections (for software updates, API calls) while blocking inbound connections from the internet.
When to use this (and when not to)
Use paired public/private subnets per AZ for any real application: web servers and load balancers go in public subnets, while databases and application servers go in private subnets. Do not put databases in public subnets — there is no reason to expose them, and it is a common security mistake. If your workload never needs outbound internet access from private subnets, you can skip NAT Gateways entirely and use VPC endpoints instead, which is cheaper.
The NAT Gateway gotcha
This is the single most important decision in a multi-AZ design.
Warning: A NAT Gateway lives in exactly one AZ. If you create only one NAT Gateway and route every private subnet through it, you have re-introduced a single point of failure — when that AZ fails, every private subnet in every AZ loses internet access. You also pay cross-AZ data transfer charges (around $0.01/GB each way) for traffic that hops from a private subnet in one AZ to a NAT Gateway in another.
The fix is simple: deploy one NAT Gateway per AZ, and route each private subnet to the NAT Gateway in its own AZ. This gives you both resilience (one AZ failing only affects that AZ’s private subnet) and lower cost (no cross-AZ hops for NAT traffic).
A NAT Gateway costs roughly $0.045/hour (~$32/month) plus data processing per GB. Three NAT Gateways cost about $96/month before data — the price of resilience. For dev environments, a single NAT Gateway is a reasonable cost saving.
Building it with the console
- Open the VPC console and choose Your VPCs > Create VPC.
- Select VPC and more. This wizard creates subnets, route tables, and NAT Gateways together.
- Set Number of Availability Zones to 3 and Number of public subnets and private subnets to 3 each.
- Under NAT gateways, choose 1 per AZ (not “In 1 AZ”).
- Leave VPC endpoints as needed, then choose Create VPC.
The wizard automatically creates one route table per private subnet, each pointing to that AZ’s NAT Gateway.
Building it with the AWS CLI
Below are the key steps. Assume the VPC vpc-0a1b2c3d and an Internet Gateway are already attached.
# Allocate an Elastic IP (a permanent public IP address) for each NAT Gateway
aws ec2 allocate-address --domain vpc
Output:
{
"AllocationId": "eipalloc-0a1b2c3d",
"PublicIp": "52.10.20.30",
"Domain": "vpc"
}
# Create a NAT Gateway in the public subnet of AZ us-east-1a
aws ec2 create-nat-gateway \
--subnet-id subnet-0a1b2c3d \
--allocation-id eipalloc-0a1b2c3d \
--tag-specifications 'ResourceType=natgateway,Tags=[{Key=Name,Value=nat-1a}]'
Output:
{
"NatGateway": {
"NatGatewayId": "nat-0a1b2c3d",
"SubnetId": "subnet-0a1b2c3d",
"State": "pending"
}
}
# Create a route table for the private subnet in AZ us-east-1a
aws ec2 create-route-table --vpc-id vpc-0a1b2c3d \
--tag-specifications 'ResourceType=route-table,Tags=[{Key=Name,Value=rtb-private-1a}]'
# Route all outbound internet traffic to THIS AZ's NAT Gateway
aws ec2 create-route \
--route-table-id rtb-0a1b2c3d \
--destination-cidr-block 0.0.0.0/0 \
--nat-gateway-id nat-0a1b2c3d
# Associate the route table with the private subnet in the same AZ
aws ec2 associate-route-table \
--route-table-id rtb-0a1b2c3d \
--subnet-id subnet-0e4f5g6h
Repeat the NAT Gateway, route table, and association steps for us-east-1b and us-east-1c, each time pointing the route at that AZ’s own NAT Gateway.
Terraform for repeatable multi-AZ layouts
Because the pattern repeats per AZ, Infrastructure as Code keeps it consistent. This snippet creates one NAT Gateway and matching private route per AZ.
variable "azs" {
default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
resource "aws_eip" "nat" {
count = length(var.azs)
domain = "vpc"
}
resource "aws_nat_gateway" "this" {
count = length(var.azs)
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
}
resource "aws_route_table" "private" {
count = length(var.azs)
vpc_id = aws_vpc.main.id
}
resource "aws_route" "private_nat" {
count = length(var.azs)
route_table_id = aws_route_table.private[count.index].id
destination_cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.this[count.index].id
}
The count.index keeps each private subnet bound to the NAT Gateway in its own AZ — exactly the resilient, cost-aware pattern.
Best Practices
- Use at least two AZs for production, three when budget allows, to survive a single AZ failure with capacity to spare.
- Deploy one NAT Gateway per AZ and route each private subnet to its own AZ’s NAT — never funnel all AZs through one NAT.
- Keep databases and app servers in private subnets; expose only load balancers and bastions in public subnets.
- Size subnets with room to grow — a
/24(251 usable IPs) per subnet is a sensible default; AWS reserves 5 IPs per subnet. - For dev or test, a single NAT Gateway is a fair trade to save ~$64/month — just accept it is not highly available.
- Use VPC endpoints for S3 and DynamoDB so that traffic to those services skips NAT Gateways entirely, cutting data charges.
- Spread your Auto Scaling groups and load balancer subnets across all the AZs you defined so failover is automatic.