Navigation

AWS aws ec2 6 min read

Placement Groups

By default, AWS decides where your EC2 (Elastic Compute Cloud) instances physically run inside a data center, spreading them around to balance its own hardware. A placement group lets you take control of that decision. You tell AWS whether you want your instances packed tightly together for speed, or pulled far apart for safety. This single choice can make a real difference to network latency, throughput, and how well your application survives a hardware failure.

What a placement group is

A placement group is a logical grouping you create for instances launched in the same AWS Region. It does not cost anything by itself — you only pay for the instances inside it. What you are really choosing is a placement strategy, which is a rule AWS follows when deciding which physical server (and which rack of servers) each instance lands on.

A rack is a physical cabinet holding many servers; instances on the same rack share network switches and power, so they talk to each other very fast but can also fail together. An Availability Zone (AZ) is one or more data centers that are isolated from other AZs in the same Region. These two ideas — racks and AZs — are the core of every placement decision.

There are three strategies, and you pick exactly one per group.

The three strategies

Cluster — pack instances together for speed

A cluster placement group squeezes your instances onto hardware that is physically close, usually within a single rack in a single AZ. Because the instances sit near each other, network packets travel a very short distance, giving you the lowest latency and the highest throughput EC2 can offer (up to high per-flow bandwidth on supported instance types using the Elastic Fabric Adapter).

When to use this: tightly coupled workloads where machines constantly chat with each other — High Performance Computing (HPC), scientific simulations, machine-learning training clusters, or financial risk modelling. When NOT to use it: anything where staying online matters more than raw speed.

Gotcha: A cluster group concentrates failure risk. Because everything sits on the same rack/AZ, a single hardware or power failure can take down the entire group at once. You trade fault isolation for latency. Mitigate by snapshotting data and keeping a recovery plan in another AZ.

Spread — push instances apart for safety

A spread placement group does the opposite. It places each instance on distinct underlying hardware — different racks, each with its own network and power. A failure on one rack affects at most one of your instances.

When to use this: a small number of critical instances that must never fail together — for example, the primary and standby nodes of a database, or a handful of important application servers.

Hard limit: A spread placement group allows a maximum of seven running instances per Availability Zone. If you need more, span multiple AZs (7 per AZ) or choose a different strategy. This limit exists because AWS must guarantee separate hardware for each instance.

Partition — isolate groups for large distributed systems

A partition placement group splits your instances into logical groups called partitions (up to seven per AZ). Each partition runs on its own set of racks that does not share hardware with other partitions. Instances within a partition may share hardware, but a failure in one partition cannot affect another.

When to use this: large distributed and replicated systems that already understand the idea of failure domains — Hadoop HDFS (the Hadoop Distributed File System), Apache Kafka, Cassandra, or HBase. You map each replica or broker set to a different partition so that one rack failure never takes out all copies of your data.

Cluster vs spread vs partition — when to use which

Aspect	Cluster	Spread	Partition
Goal	Lowest latency, highest throughput	Maximum fault isolation	Isolate failure domains at scale
Physical placement	Same rack / close together	Each instance on separate hardware	Grouped into isolated partitions
Instance limit	Limited by capacity (no fixed cap)	7 per AZ	7 partitions per AZ
Typical use	HPC, ML training, low-latency clusters	Small set of critical instances	HDFS, Kafka, Cassandra
AZ usage	Single AZ	Single or multiple AZs	Single or multiple AZs
Risk profile	High blast radius	Lowest blast radius	Controlled blast radius

Create a placement group

Using the AWS Management Console

Open the EC2 console and choose Placement groups under the Network & Security menu in the left sidebar.
Choose Create placement group.
Enter a Name (for example, hpc-cluster-1).
Choose a Placement strategy: Cluster, Spread, or Partition.
If you chose Partition, set the Number of partitions (1–7).
Optionally add tags, then choose Create group.

Using the AWS CLI

aws ec2 create-placement-group \
  --group-name hpc-cluster-1 \
  --strategy cluster

Output:

{
    "PlacementGroup": {
        "GroupName": "hpc-cluster-1",
        "State": "available",
        "Strategy": "cluster",
        "GroupId": "pg-0a1b2c3d4e5f6a7b8",
        "GroupArn": "arn:aws:ec2:us-east-1:111122223333:placement-group/hpc-cluster-1"
    }
}

For a partition group, add the partition count:

aws ec2 create-placement-group \
  --group-name kafka-pg \
  --strategy partition \
  --partition-count 3

Launch instances into a placement group

You attach the group at launch time with the --placement option. The group name goes in the Placement structure.

aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type c7g.4xlarge \
  --key-name my-key \
  --count 4 \
  --placement "GroupName=hpc-cluster-1" \
  --subnet-id subnet-0a1b2c3d

Output:

{
    "Instances": [
        { "InstanceId": "i-0a1b2c3d4e5f60001", "InstanceType": "c7g.4xlarge" },
        { "InstanceId": "i-0a1b2c3d4e5f60002", "InstanceType": "c7g.4xlarge" }
    ]
}

For a partition group, specify which partition each instance joins with PartitionNumber:

aws ec2 run-instances \
  --image-id ami-0abcdef1234567890 \
  --instance-type r7g.2xlarge \
  --count 1 \
  --placement "GroupName=kafka-pg,PartitionNumber=2" \
  --subnet-id subnet-0a1b2c3d

Infrastructure as Code

The same cluster group in CloudFormation:

Resources:
  HpcCluster:
    Type: AWS::EC2::PlacementGroup
    Properties:
      Strategy: cluster

Or in Terraform:

resource "aws_placement_group" "hpc_cluster" {
  name     = "hpc-cluster-1"
  strategy = "cluster"
}

Cost note

Placement groups themselves are free — there is no charge for creating or using one. You pay only the normal hourly rate for the instances inside it. The one indirect cost to watch: cluster groups perform best on a single instance type and benefit from larger, network-optimized instances, which are pricier per hour. Choosing seven small spread instances vs one large cluster instance is a cost-and-resilience trade-off, not a placement-group fee.

Best Practices

Pick your strategy by intent: optimize for latency (cluster), fault isolation (spread), or scaled failure domains (partition) — never try to do all three at once.
For cluster groups, launch all instances in one request and use the same instance type so AWS can find contiguous capacity; if you get an InsufficientInstanceCapacity error, stop all instances in the group and start them together.
Keep spread groups within the 7-per-AZ limit; if you need more critical instances, add more AZs rather than overflowing one.
Map each Kafka broker set or HDFS replica to a separate partition so a single rack failure never loses a quorum.
Never put your only copy of data in a cluster group — back it up with EBS snapshots and a cross-AZ recovery plan.
Use launch templates to bake the placement group into a repeatable launch configuration for Auto Scaling.