ECS Task Definitions & Services
Amazon ECS (Elastic Container Service, a fully managed service for running containers) needs two things to run your app: a description of what to run, and instructions for how many copies to keep alive. The first is a task definition, the second is a service. Getting these right is the difference between a container that starts once and dies versus one that stays healthy, scales, and sits behind a load balancer. This page walks through both, plus the single most common ECS mistake: confusing the two IAM roles.
Task definitions: the blueprint
A task definition is a JSON document that describes one or more containers that belong together. Think of it like a recipe. It does not run anything by itself; it just declares the ingredients. Each task definition holds:
| Field | What it means |
|---|---|
image | The container image to pull, e.g. an Amazon ECR (Elastic Container Registry) URI like 123456789012.dkr.ecr.us-east-1.amazonaws.com/web:latest. |
cpu / memory | How much CPU and RAM the task gets. On Fargate (AWS’s serverless container engine) these are required and must use valid pairs (e.g. 256 CPU units = 0.25 vCPU with 512 MiB memory). |
portMappings | Which container ports are exposed, e.g. port 8080. |
environment / secrets | Plain environment variables, or secrets pulled from AWS Secrets Manager / SSM Parameter Store. |
executionRoleArn | The task execution role (see below). |
taskRoleArn | The task role (see below). |
logConfiguration | Where logs go, usually the awslogs driver writing to Amazon CloudWatch Logs. |
A running copy of a task definition is called a task. One task definition can spin up many tasks.
Task definitions are immutable. You never “edit” one. Every change creates a new numbered revision (e.g.
web:7becomesweb:8). This gives you a clean rollback path: just point your service back at an older revision.
A minimal Fargate task definition
{
"family": "web",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/webAppTaskRole",
"containerDefinitions": [
{
"name": "web",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/web:latest",
"portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
"environment": [{ "name": "APP_ENV", "value": "production" }],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
Register it from the CLI:
aws ecs register-task-definition --cli-input-json file://web-taskdef.json
Output:
{
"taskDefinition": {
"taskDefinitionArn": "arn:aws:ecs:us-east-1:123456789012:task-definition/web:8",
"family": "web",
"revision": 8,
"status": "ACTIVE"
}
}
Notice the revision is 8 — AWS bumped it automatically.
The two IAM roles (the gotcha that breaks deployments)
This trips up almost everyone. A task definition references two different IAM roles, and they do opposite jobs.
| Role | Used by | Grants permission to… | Symptom if wrong |
|---|---|---|---|
Task execution role (executionRoleArn) | The ECS agent / Fargate, before your container starts | Pull the image from ECR, write logs to CloudWatch, read secrets at startup | CannotPullContainerError or no logs appear |
Task role (taskRoleArn) | Your application code, while it runs | Whatever your app calls — read an S3 bucket, write to DynamoDB, publish to SQS | AccessDenied from inside your app |
Plain English: the execution role lets AWS set the task up for you. The task role lets your code talk to other AWS services. Mix them up and you either can’t start the container, or the container starts fine but your app gets permission errors the moment it touches AWS.
When to use which: every task needs an execution role (use the AWS-managed AmazonECSTaskExecutionRolePolicy as a baseline). You only need a task role if your app makes AWS API calls. If your app never touches AWS, omit taskRoleArn entirely.
Security tip: scope the task role tightly — grant only the exact actions and resources your app uses (e.g.
s3:GetObjecton one bucket), neverAdministratorAccess. Each task should follow least privilege.
Services: keeping tasks running
If you run a task directly, it runs once and stops when it exits — fine for a batch job, useless for a web server. An ECS service fixes this. A service watches a desired count of tasks and replaces any that crash, get killed, or fail health checks. It can also register tasks with an Application Load Balancer (ALB) so traffic is spread across them, and it handles rolling deployments when you ship a new revision.
When to use a service: long-running workloads — APIs, web apps, background workers that should always be up. When NOT to: one-off or scheduled jobs. For those, run a standalone task (aws ecs run-task) or use ECS Scheduled Tasks instead.
Create a service from the CLI
aws ecs create-service \
--cluster prod-cluster \
--service-name web \
--task-definition web:8 \
--desired-count 3 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-0a1b2c3d,subnet-0e4f5g6h],securityGroups=[sg-0a1b2c3d],assignPublicIp=DISABLED}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/0a1b2c3d,containerName=web,containerPort=8080"
Output:
{
"service": {
"serviceName": "web",
"status": "ACTIVE",
"desiredCount": 3,
"runningCount": 0,
"launchType": "FARGATE",
"taskDefinition": "arn:aws:ecs:us-east-1:123456789012:task-definition/web:8"
}
}
runningCount climbs to 3 as ECS launches the tasks behind your ALB.
Console steps to create the same service
- Open the ECS console and choose your cluster (e.g.
prod-cluster). - On the Services tab, click Create.
- Under Environment, set Launch type to Fargate.
- Under Deployment configuration, pick Family =
weband Revision =8(orLATEST), set Service name =weband Desired tasks =3. - Under Networking, select your VPC, the private subnets
subnet-0a1b2c3d/subnet-0e4f5g6h, and security groupsg-0a1b2c3d. - Under Load balancing, choose Application Load Balancer, select your ALB and target group
web-tg, mapping containerweb:8080. - Click Create. ECS launches three tasks and registers them with the ALB.
Shipping an update
To deploy a new image, register a new revision, then point the service at it:
aws ecs update-service --cluster prod-cluster --service web --task-definition web:9
ECS performs a rolling update by default: it starts new tasks on revision 9, waits for them to pass health checks, then drains and stops the old ones. To roll back, run the same command with the previous revision (web:8).
Cost note: with Fargate you pay per vCPU-second and GB-second while tasks run. Three tasks at 0.25 vCPU / 0.5 GB running 24/7 cost roughly $25-30/month in
us-east-1. Settingdesired-counthigher than you need is the easiest way to overspend, so right-size CPU/memory and use Service Auto Scaling to scale down off-peak.
Best practices
- Pin services to a specific revision in production (e.g.
web:8), not a moving tag, so deployments are deliberate and rollbacks are exact. - Always attach the AWS-managed
AmazonECSTaskExecutionRolePolicyto your execution role; add ECR and Secrets Manager permissions only if you read private images or secrets. - Keep the task role least-privilege and separate from the execution role — never reuse one for both.
- Define a container health check (or rely on the ALB target group health check) so the service replaces unhealthy tasks automatically.
- Send logs to CloudWatch with the
awslogsdriver and a clearawslogs-stream-prefixso you can trace each task. - Use private subnets with
assignPublicIp=DISABLEDand let the ALB handle public traffic, keeping tasks off the public internet. - Enable Service Auto Scaling on a CloudWatch metric (CPU or request count) instead of hard-coding a high desired count.