Using Spot Instances

Spot Instances let you use Amazon EC2 (Elastic Compute Cloud, AWS’s virtual servers) spare capacity at a steep discount — often up to 90% off the normal On-Demand price. The catch is simple: AWS can take that capacity back whenever it needs it, giving you only a 2-minute warning. That trade-off makes Spot perfect for work that can be paused, retried, or spread across many machines — and dangerous for anything that must stay up on a single server. This page explains how Spot pricing works, how to request Spot capacity, and how to handle interruptions gracefully.

How Spot pricing works

AWS sells its unused EC2 capacity as Spot Instances. The price floats based on supply and demand for each instance type in each Availability Zone (AZ — an isolated data center location within a Region). Prices change gradually, not in wild spikes, and you always pay the current Spot price — never more than the On-Demand price.

You no longer set a “bid” the way you did in the old days. Today you just request Spot capacity, optionally set a maximum price you’re willing to pay (default is the On-Demand price), and AWS fulfills it from available pools. A “pool” is one combination of instance type + AZ (for example, m5.large in us-east-1a).

Cost note: A c6i.2xlarge that costs about $0.34/hour On-Demand can run for roughly $0.10/hour on Spot — saving over $1,700/year per instance if it runs continuously. Multiply that across a fleet and the savings are large.

When to use Spot (and when not to)

Spot shines for fault-tolerant, flexible, and stateless workloads. It is wrong for anything that loses data or breaks when a single server vanishes.

Workload	Spot fit?	Why
Batch / data processing (e.g. video encoding)	Excellent	Jobs can be re-queued and retried
CI/CD build runners	Excellent	Builds are short and repeatable
Stateless web tier behind an Auto Scaling group	Good	ASG replaces lost instances automatically
Big data (Spark, EMR, Hadoop)	Good	Frameworks already tolerate node loss
Stateful single server (database primary)	Never	A reclaim means data loss / downtime
Long-running job with no checkpointing	Risky	All progress lost on interruption

An Auto Scaling group (ASG) is an AWS service that automatically launches and replaces EC2 instances to maintain a target count.

The 2-minute interruption notice

When AWS needs your Spot capacity back, it sends an interruption notice 2 minutes before stopping or terminating the instance. Your job is to detect that notice and react: finish or checkpoint in-flight work, stop accepting new requests, and exit cleanly.

You read the notice from the Instance Metadata Service (a built-in HTTP endpoint at 169.254.169.254 that every instance can query about itself). With IMDSv2 (the secure, token-based version), you first fetch a token, then poll for the interruption action.

# 1. Get a short-lived metadata token (IMDSv2)
TOKEN=$(curl -sX PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 60")

# 2. Poll for the rebalance/interruption notice
curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/spot/instance-action

Output:

{"action": "terminate", "time": "2026-06-15T14:32:00Z"}

If the path returns HTTP 404, there is no pending interruption — keep working. When it returns JSON like above, you have until time to drain. A typical handler script polls every 5 seconds in the background and, on a hit, runs your cleanup: deregister from the load balancer, flush buffers, and upload a checkpoint to Amazon S3 (Simple Storage Service, AWS’s object storage).

Requesting Spot Instances

There are two common ways to get Spot capacity: launch a one-off Spot Instance, or (recommended) let an Auto Scaling group manage a mixed-instances fleet across multiple pools.

Single Spot Instance — Console

Open the EC2 console and choose Launch instances.
Pick an Amazon Machine Image (AMI — a template containing the OS and software), such as ami-0abcdef1234567890, and an instance type like c6i.2xlarge.
Expand Advanced details.
Set Purchasing option to Spot instances.
(Optional) Set a Maximum price — leave blank to cap at the On-Demand price.
Leave Interruption behavior as Terminate (or Stop/Hibernate if you want to resume later).
Click Launch instance.

Single Spot Instance — CLI

aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type c6i.2xlarge \
  --key-name my-keypair \
  --security-group-ids sg-0a1b2c3d \
  --subnet-id subnet-0a1b2c3d \
  --instance-market-options '{"MarketType":"spot"}'

Output:

{
    "Instances": [
        {
            "InstanceId": "i-0a1b2c3d4e5f",
            "InstanceLifecycle": "spot",
            "InstanceType": "c6i.2xlarge",
            "State": { "Name": "pending" }
        }
    ]
}

The "InstanceLifecycle": "spot" field confirms it launched as Spot.

Resilient fleet — mixed instances with an ASG

The single biggest mistake with Spot is depending on one instance type in one AZ. If that one pool runs short, your whole fleet dies at once. Instead, ask for many instance types across many AZs so AWS can fulfill capacity from whichever pool is healthy. This Terraform (an infrastructure-as-code tool) snippet creates a mixed-instances ASG:

resource "aws_autoscaling_group" "workers" {
  desired_capacity    = 6
  min_size            = 2
  max_size            = 20
  vpc_zone_identifier = [aws_subnet.a.id, aws_subnet.b.id, aws_subnet.c.id]

  mixed_instances_policy {
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.worker.id
      }
      # Multiple types = multiple Spot pools
      override { instance_type = "c6i.2xlarge" }
      override { instance_type = "c6a.2xlarge" }
      override { instance_type = "c5.2xlarge" }
    }

    instances_distribution {
      on_demand_base_capacity                  = 2     # keep 2 stable On-Demand
      on_demand_percentage_above_base_capacity = 0     # rest is Spot
      spot_allocation_strategy                 = "price-capacity-optimized"
    }
  }
}

The price-capacity-optimized strategy is the modern default recommendation: it picks pools that are both cheap and have deep available capacity, which lowers your interruption rate. Keeping a small On-Demand base means your service survives even if every Spot pool is exhausted at once.

Stopping a Spot request

To stop a one-off Spot Instance request, cancel it (this does not terminate already-running instances — terminate those separately):

aws ec2 cancel-spot-instance-requests \
  --spot-instance-request-ids sir-0a1b2c3d

aws ec2 terminate-instances --instance-ids i-0a1b2c3d4e5f

Best Practices

Design for interruption. Checkpoint long jobs to S3 frequently and make every task safe to retry from scratch.
Drain gracefully on the 2-minute notice. Poll instance-action, deregister from the load balancer, and finish in-flight requests before exit.
Never run a stateful single instance on Spot — no database primary, no single point of failure. Use On-Demand or Reserved capacity for those.
Diversify across pools. List several instance types and AZs so one capacity shortage cannot wipe out your whole fleet.
Use price-capacity-optimized allocation to minimize interruptions, not pure lowest-price.
Keep an On-Demand base in the ASG for critical baseline capacity, and let Spot scale the cheap, flexible bulk on top.
Watch the Spot interruption rate in the EC2 Spot console (the “Spot placement score” and instance-type history) to pick stable pools.