ECS Task Definitions & Services

Amazon ECS (Elastic Container Service, a fully managed service for running containers) needs two things to run your app: a description of what to run, and instructions for how many copies to keep alive. The first is a task definition, the second is a service. Getting these right is the difference between a container that starts once and dies versus one that stays healthy, scales, and sits behind a load balancer. This page walks through both, plus the single most common ECS mistake: confusing the two IAM roles.

Task definitions: the blueprint

A task definition is a JSON document that describes one or more containers that belong together. Think of it like a recipe. It does not run anything by itself; it just declares the ingredients. Each task definition holds:

Field	What it means
`image`	The container image to pull, e.g. an Amazon ECR (Elastic Container Registry) URI like `123456789012.dkr.ecr.us-east-1.amazonaws.com/web:latest`.
`cpu` / `memory`	How much CPU and RAM the task gets. On Fargate (AWS’s serverless container engine) these are required and must use valid pairs (e.g. 256 CPU units = 0.25 vCPU with 512 MiB memory).
`portMappings`	Which container ports are exposed, e.g. port `8080`.
`environment` / `secrets`	Plain environment variables, or secrets pulled from AWS Secrets Manager / SSM Parameter Store.
`executionRoleArn`	The task execution role (see below).
`taskRoleArn`	The task role (see below).
`logConfiguration`	Where logs go, usually the `awslogs` driver writing to Amazon CloudWatch Logs.

A running copy of a task definition is called a task. One task definition can spin up many tasks.

Task definitions are immutable. You never “edit” one. Every change creates a new numbered revision (e.g. web:7 becomes web:8). This gives you a clean rollback path: just point your service back at an older revision.

A minimal Fargate task definition

{
  "family": "web",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/webAppTaskRole",
  "containerDefinitions": [
    {
      "name": "web",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/web:latest",
      "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }],
      "environment": [{ "name": "APP_ENV", "value": "production" }],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/web",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

aws ecs register-task-definition --cli-input-json file://web-taskdef.json

Output:

{
    "taskDefinition": {
        "taskDefinitionArn": "arn:aws:ecs:us-east-1:123456789012:task-definition/web:8",
        "family": "web",
        "revision": 8,
        "status": "ACTIVE"
    }
}

Notice the revision is 8 — AWS bumped it automatically.

The two IAM roles (the gotcha that breaks deployments)

This trips up almost everyone. A task definition references two different IAM roles, and they do opposite jobs.

Role	Used by	Grants permission to…	Symptom if wrong
Task execution role (`executionRoleArn`)	The ECS agent / Fargate, before your container starts	Pull the image from ECR, write logs to CloudWatch, read secrets at startup	`CannotPullContainerError` or no logs appear
Task role (`taskRoleArn`)	Your application code, while it runs	Whatever your app calls — read an S3 bucket, write to DynamoDB, publish to SQS	`AccessDenied` from inside your app

Plain English: the execution role lets AWS set the task up for you. The task role lets your code talk to other AWS services. Mix them up and you either can’t start the container, or the container starts fine but your app gets permission errors the moment it touches AWS.

When to use which: every task needs an execution role (use the AWS-managed AmazonECSTaskExecutionRolePolicy as a baseline). You only need a task role if your app makes AWS API calls. If your app never touches AWS, omit taskRoleArn entirely.

Security tip: scope the task role tightly — grant only the exact actions and resources your app uses (e.g. s3:GetObject on one bucket), never AdministratorAccess. Each task should follow least privilege.

Services: keeping tasks running

If you run a task directly, it runs once and stops when it exits — fine for a batch job, useless for a web server. An ECS service fixes this. A service watches a desired count of tasks and replaces any that crash, get killed, or fail health checks. It can also register tasks with an Application Load Balancer (ALB) so traffic is spread across them, and it handles rolling deployments when you ship a new revision.

When to use a service: long-running workloads — APIs, web apps, background workers that should always be up. When NOT to: one-off or scheduled jobs. For those, run a standalone task (aws ecs run-task) or use ECS Scheduled Tasks instead.

Create a service from the CLI

aws ecs create-service \
  --cluster prod-cluster \
  --service-name web \
  --task-definition web:8 \
  --desired-count 3 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-0a1b2c3d,subnet-0e4f5g6h],securityGroups=[sg-0a1b2c3d],assignPublicIp=DISABLED}" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/0a1b2c3d,containerName=web,containerPort=8080"

Output:

{
    "service": {
        "serviceName": "web",
        "status": "ACTIVE",
        "desiredCount": 3,
        "runningCount": 0,
        "launchType": "FARGATE",
        "taskDefinition": "arn:aws:ecs:us-east-1:123456789012:task-definition/web:8"
    }
}

runningCount climbs to 3 as ECS launches the tasks behind your ALB.

Console steps to create the same service

Open the ECS console and choose your cluster (e.g. prod-cluster).
On the Services tab, click Create.
Under Environment, set Launch type to Fargate.
Under Deployment configuration, pick Family = web and Revision = 8 (or LATEST), set Service name = web and Desired tasks = 3.
Under Networking, select your VPC, the private subnets subnet-0a1b2c3d / subnet-0e4f5g6h, and security group sg-0a1b2c3d.
Under Load balancing, choose Application Load Balancer, select your ALB and target group web-tg, mapping container web:8080.
Click Create. ECS launches three tasks and registers them with the ALB.

Shipping an update

To deploy a new image, register a new revision, then point the service at it:

aws ecs update-service --cluster prod-cluster --service web --task-definition web:9

ECS performs a rolling update by default: it starts new tasks on revision 9, waits for them to pass health checks, then drains and stops the old ones. To roll back, run the same command with the previous revision (web:8).

Cost note: with Fargate you pay per vCPU-second and GB-second while tasks run. Three tasks at 0.25 vCPU / 0.5 GB running 24/7 cost roughly $25-30/month in us-east-1. Setting desired-count higher than you need is the easiest way to overspend, so right-size CPU/memory and use Service Auto Scaling to scale down off-peak.

Best practices

Pin services to a specific revision in production (e.g. web:8), not a moving tag, so deployments are deliberate and rollbacks are exact.
Always attach the AWS-managed AmazonECSTaskExecutionRolePolicy to your execution role; add ECR and Secrets Manager permissions only if you read private images or secrets.
Keep the task role least-privilege and separate from the execution role — never reuse one for both.
Define a container health check (or rely on the ALB target group health check) so the service replaces unhealthy tasks automatically.
Send logs to CloudWatch with the awslogs driver and a clear awslogs-stream-prefix so you can trace each task.
Use private subnets with assignPublicIp=DISABLED and let the ALB handle public traffic, keeping tasks off the public internet.
Enable Service Auto Scaling on a CloudWatch metric (CPU or request count) instead of hard-coding a high desired count.