Project: Scalable Web App (EC2 + ALB + ASG + RDS)
This project builds the classic “highly available” (always-on, survives a server failure) web application on AWS. You will run web servers on EC2 (virtual machines in the cloud), spread the traffic across them with an Application Load Balancer (ALB), let an Auto Scaling Group (ASG) add and remove servers automatically, and store data in a managed RDS database that fails over to a backup copy. Everything sits in a VPC (Virtual Private Cloud — your own private network) across two Availability Zones (AZs — separate data centers) so a single failure never takes you down. This is the most common production pattern on AWS, and learning it teaches the security and scaling habits that matter everywhere else.
The architecture at a glance
The whole design is about layering and privacy. Public things face the internet; everything else hides in private subnets and only talks to its neighbor through tightly scoped security groups (virtual firewalls attached to resources).
| Layer | Where it lives | Who can reach it |
|---|---|---|
| ALB | Public subnets (2 AZs) | The internet, on ports 80/443 |
| EC2 (in ASG) | Private subnets (2 AZs) | Only the ALB’s security group |
| RDS (Multi-AZ) | Isolated DB subnets (2 AZs) | Only the EC2 security group |
| NAT Gateway | Public subnets (one per AZ) | Lets private servers reach the internet outbound |
The single most important rule: never open a database or app server to a CIDR range like
0.0.0.0/0. Reference the upstream security group instead. That way “who can connect” follows the resource, not a fragile IP range.
When to use this pattern (and when not to)
Use EC2 + ALB + ASG + RDS when you have a long-running web server (a Node, Java, Python, or PHP app) that needs to stay up 24/7, handle variable traffic, and survive a server or data-center failure. It is the default choice for traditional web apps.
Do not reach for it if your workload is bursty and event-driven (use a serverless API instead — you pay nothing when idle), if you are serving only static files (use a static website on S3 + CloudFront), or if you have already containerized everything (use ECS/Fargate). EC2 means you patch and manage operating systems; serverless and containers hide much of that.
Step 1: Build the VPC and subnets
You need a VPC with six subnets: two public (for the ALB and NAT), two private (for EC2), and two isolated (for RDS), one of each per AZ. The detailed walkthrough lives in the three-tier VPC project; here is the fast path.
Console steps:
- Open the VPC console and choose Create VPC.
- Select VPC and more. This wizard creates subnets, route tables, and gateways for you.
- Set Number of Availability Zones to
2, Number of public subnets to2, Number of private subnets to4(we will treat two as app, two as DB). - For NAT gateways, choose 1 per AZ so a single AZ outage does not break outbound traffic.
- Click Create VPC.
CLI equivalent (the core call):
aws ec2 create-vpc \
--cidr-block 10.0.0.0/16 \
--tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=scalable-app-vpc}]'
Output:
{
"Vpc": {
"VpcId": "vpc-0a1b2c3d4e5f6a7b8",
"State": "pending",
"CidrBlock": "10.0.0.0/16"
}
}
Cost note: each NAT Gateway costs about $0.045/hour (~$33/month) plus data processing. Two NATs is roughly $66/month before traffic. For a learning environment, one NAT is fine; for production, one per AZ is the resilient choice.
Step 2: Create the chained security groups
Create three security groups and wire them together by reference. Notice no IP ranges appear except on the public ALB.
# ALB SG: open to the world on HTTP
aws ec2 create-security-group --group-name alb-sg \
--description "ALB inbound from internet" --vpc-id vpc-0a1b2c3d4e5f6a7b8
aws ec2 authorize-security-group-ingress --group-id sg-0a1b2c3d \
--protocol tcp --port 80 --cidr 0.0.0.0/0
# App SG: only the ALB SG may reach it on port 8080
aws ec2 authorize-security-group-ingress --group-id sg-0appaaaa \
--protocol tcp --port 8080 --source-group sg-0a1b2c3d
# DB SG: only the App SG may reach MySQL/Postgres
aws ec2 authorize-security-group-ingress --group-id sg-0dbbbbbb \
--protocol tcp --port 3306 --source-group sg-0appaaaa
The --source-group flag is what makes the firewall reference another security group instead of an IP. New servers launched into the App SG are instantly allowed without touching any rule.
Step 3: Launch template and user data
An ASG launches identical servers from a launch template. The user data (a startup script) installs your app so every new instance is ready to serve traffic on boot.
aws ec2 create-launch-template \
--launch-template-name web-app-lt \
--launch-template-data '{
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "t3.micro",
"SecurityGroupIds": ["sg-0appaaaa"],
"UserData": "'"$(base64 -w0 user-data.sh)"'"
}'
A minimal user-data.sh for an Amazon Linux 2023 box:
#!/bin/bash
dnf install -y httpd
echo "Hello from $(hostname -f)" > /var/www/html/index.html
systemctl enable --now httpd
Gotcha — keep session state OFF the instance. When the ASG scales in, it terminates a server and any local session data dies with it. Store sessions in ElastiCache (managed Redis) or DynamoDB so users are not logged out mid-request.
Step 4: ALB, target group, and the ASG
The ALB routes requests to a target group; the ASG registers its instances into that target group and uses the ALB’s health check to decide if a server is healthy.
Console steps:
- In EC2 > Load Balancers, create an Application Load Balancer, place it in the two public subnets, and attach
alb-sg. - Create a target group of type Instances on port
8080with health check path/. - In EC2 > Auto Scaling Groups, create a group from
web-app-lt, select the two private subnets, and attach the target group. - Set Health check type to ELB (not just EC2) so the ASG replaces instances the load balancer marks unhealthy.
- Set desired/min/max to
2 / 2 / 6and add a target-tracking policy at 60% average CPU.
CLI equivalent for the ASG:
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name web-app-asg \
--launch-template LaunchTemplateName=web-app-lt \
--min-size 2 --max-size 6 --desired-capacity 2 \
--vpc-zone-identifier "subnet-0app1111,subnet-0app2222" \
--target-group-arns arn:aws:elasticloadbalancing:...:targetgroup/web-tg/abc123 \
--health-check-type ELB --health-check-grace-period 120
Using ELB health checks instead of the default EC2 checks is critical: an EC2 check only knows the VM is running, not that your web server actually answers requests.
Step 5: RDS Multi-AZ database
RDS Multi-AZ keeps a synchronous standby copy in the second AZ and fails over automatically in under a minute.
aws rds create-db-instance \
--db-instance-identifier app-db \
--engine mysql --db-instance-class db.t3.small \
--allocated-storage 20 --multi-az \
--master-username admin --manage-master-user-password \
--vpc-security-group-ids sg-0dbbbbbb \
--db-subnet-group-name app-db-subnets
--manage-master-user-password stores the password in Secrets Manager automatically, so no secret ever sits in your script.
Step 6: Cleanup
Delete in reverse order to avoid dependency errors, then confirm nothing is left billing.
aws autoscaling delete-auto-scaling-group --auto-scaling-group-name web-app-asg --force-delete
aws rds delete-db-instance --db-instance-identifier app-db --skip-final-snapshot
aws elbv2 delete-load-balancer --load-balancer-arn arn:aws:...:loadbalancer/app/web-alb/abc
# NAT gateways and the VPC are the most commonly forgotten charges:
aws ec2 describe-nat-gateways --query 'NatGateways[].NatGatewayId'
Best Practices
- Keep app and DB tiers private; expose only the ALB, and chain security groups by reference rather than CIDR.
- Use ELB health checks on the ASG so broken-but-running instances get replaced.
- Run one NAT Gateway per AZ in production so an AZ outage cannot sever outbound traffic.
- Store session state in ElastiCache or DynamoDB so scale-in never loses user data.
- Enable RDS Multi-AZ and let Secrets Manager hold the password; never bake credentials into user data.
- Bake your app into a custom AMI (machine image) or pull it in user data so new instances are immediately healthy.
- Tag everything and set CloudWatch alarms on CPU and unhealthy host count to catch trouble early.