What is Auto Scaling?

Imagine your app gets a rush of traffic at lunchtime and almost none at 3 a.m. If you run a fixed number of servers, you either pay for capacity you don’t use at night or fall over when the rush hits. EC2 Auto Scaling solves this by automatically adding servers when demand goes up and removing them when demand goes down. It also quietly replaces any server that dies, so your app heals itself without you waking up at 3 a.m.

In AWS, EC2 means Elastic Compute Cloud, which is Amazon’s service for renting virtual servers (called “instances”). Auto Scaling watches a group of these instances and keeps the right number running for you.

What an Auto Scaling Group does

An Auto Scaling Group (ASG) is a logical group of EC2 instances that AWS manages together as one unit. You tell the ASG how many instances you want, and it makes sure that many are always running and healthy.

The ASG does three big jobs:

Maintains a desired count. You ask for, say, 4 instances. The ASG keeps exactly 4 running. If one crashes or you manually terminate it, the ASG launches a fresh replacement automatically. This is called self-healing.
Scales out and in on demand. When a metric like CPU usage climbs, the ASG adds instances (“scale out”). When demand drops, it removes them (“scale in”). This is elasticity: you pay only for what you actually need at any moment.
Spreads instances across Availability Zones. An Availability Zone (AZ) is one isolated data center within a region. The ASG balances instances across the AZs you pick, so losing one data center doesn’t take down your app.

To launch new instances, the ASG uses a launch template, which is a saved recipe describing the Amazon Machine Image (AMI, a server image), the instance type, security groups, and startup script. Every new instance is built from this same recipe, so they are interchangeable.

Min, desired, and max

Every ASG has three numbers that control its size. Understanding these is the whole game.

Setting	Meaning	Example
Minimum	The fewest instances the ASG will ever run, even at idle.	`2`
Desired	The number the ASG actively tries to keep right now. Scaling changes this.	`4`
Maximum	The most instances the ASG is allowed to run, even under heavy load.	`10`

The ASG always keeps the running count between min and max. Scaling policies adjust the desired value up and down within those bounds. The maximum is your safety cap: it protects your wallet from a runaway scale-out (for example, a traffic spike from a bot attack that could otherwise launch hundreds of instances).

Tip: Set your minimum to at least 2 and spread across two or more AZs for production. That way a single instance failure or a whole AZ outage never leaves you with zero servers.

When to use Auto Scaling (and when not to)

Use it when:

Your traffic changes over time (daily peaks, marketing spikes, weekend lulls).
Your instances are stateless and replaceable — any instance can serve any request, and losing one loses nothing important.
You want self-healing so failed instances get replaced automatically.

Do NOT rely on it when:

Your instance holds important data locally (a database file, an upload folder, an in-memory user session). A scale-in event will terminate that instance and the data is gone.
You actually need a bigger machine, not more machines. Auto Scaling does horizontal scaling (more instances of the same size), not vertical scaling (resizing one instance to more CPU/RAM).

Gotcha: Auto Scaling will happily terminate an instance during a scale-in, and it does not know or care what local state lives on it. If a user uploaded a file to local disk, or a session is stored in instance memory, that work vanishes when the instance is removed. Keep state off the instances — use Amazon S3 for files, Amazon RDS or DynamoDB for data, and ElastiCache or sticky sessions for sessions. Treat every instance as disposable.

Creating a simple ASG

Console steps

Open the EC2 console and go to Auto Scaling Groups in the left menu.
Click Create Auto Scaling group, give it a name, and select a launch template.
Choose your VPC (your private network) and pick two or more subnets in different Availability Zones.
On the group size page, set Desired = 4, Minimum = 2, Maximum = 10.
(Optional) Attach a load balancer target group so traffic is shared across instances.
Review and click Create Auto Scaling group.

AWS CLI

This assumes you already created a launch template named web-template. The CLI is AWS CLI v2.

aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-asg \
  --launch-template "LaunchTemplateName=web-template,Version=1" \
  --min-size 2 \
  --max-size 10 \
  --desired-capacity 4 \
  --vpc-zone-identifier "subnet-0a1b2c3d,subnet-0e4f5a6b"

Check what the group is doing:

aws autoscaling describe-auto-scaling-groups \
  --auto-scaling-group-names web-asg \
  --query "AutoScalingGroups[0].{Min:MinSize,Desired:DesiredCapacity,Max:MaxSize,Instances:Instances[].InstanceId}"

Output:

{
    "Min": 2,
    "Desired": 4,
    "Max": 10,
    "Instances": [
        "i-0a1b2c3d4e5f00001",
        "i-0a1b2c3d4e5f00002",
        "i-0a1b2c3d4e5f00003",
        "i-0a1b2c3d4e5f00004"
    ]
}

CloudFormation

Resources:
  WebAsg:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      AutoScalingGroupName: web-asg
      MinSize: "2"
      MaxSize: "10"
      DesiredCapacity: "4"
      VPCZoneIdentifier:
        - subnet-0a1b2c3d
        - subnet-0e4f5a6b
      LaunchTemplate:
        LaunchTemplateId: lt-0a1b2c3d4e5f00001
        Version: "1"

A note on cost

The ASG itself is free — you pay only for the EC2 instances it runs while they run. That is the point: scaling in at night to 2 instances instead of 10 directly cuts your bill. For example, eight t3.medium instances at roughly $0.042/hour add up to about $0.34/hour; removing them for the 12 quiet hours each day saves around $120 a month. Set a sensible maximum so a sudden spike (or a runaway loop) can’t quietly launch dozens of instances and surprise you on the invoice.

Best practices

Keep instances stateless — store files in S3 and data in RDS or DynamoDB so scale-in never loses anything.
Run a minimum of 2 instances across 2+ Availability Zones for resilience.
Always set a maximum to cap cost during traffic spikes.
Use a launch template (not the older launch configuration, which is deprecated) so every instance is built identically.
Let the ASG use ELB health checks, not just EC2 status checks, so it replaces instances that are running but unhealthy at the app level.
Bake startup work into the AMI or a fast user-data script so new instances become ready quickly during a scale-out.