Amazon SQS (Message Queues)
Amazon SQS (Simple Queue Service) is a fully managed message queue: a holding area where one part of your system drops a message and another part picks it up later. The piece that sends a message is the producer; the piece that reads it is the consumer. Because the queue sits in the middle, the producer and consumer never have to be online at the same time, run at the same speed, or even know about each other. This decoupling is the single most useful pattern in distributed systems, and SQS gives it to you with no servers to run and effectively unlimited scale.
Why decouple with a queue
Imagine a web app that resizes uploaded images. If the web server resizes images itself, a traffic spike means slow responses and crashed servers. Instead, the server drops a “resize this image” message on a queue and replies instantly. A separate fleet of workers pulls messages off the queue and does the slow work at whatever pace it can manage. If 10,000 uploads arrive at once, they simply pile up in the queue and drain over time. Nothing is lost, and nothing falls over.
When to use SQS: any time you want to smooth out traffic spikes (called buffering), run slow work in the background, retry failed work safely, or break a big app into independent services that can be deployed and scaled separately.
When NOT to use it: when you need an immediate synchronous answer (use a direct API call), or when many different consumers each need their own copy of every message (that is fan-out — use Amazon SNS or EventBridge instead).
Standard vs FIFO queues
SQS comes in two flavors. You pick one when you create the queue and cannot change it later.
| Standard | FIFO (First-In-First-Out) | |
|---|---|---|
| Ordering | Best-effort — messages can arrive out of order | Strict order, preserved within a message group |
| Delivery | At-least-once — a message can be delivered more than once | Exactly-once processing — duplicates are removed |
| Throughput | Nearly unlimited messages per second | Up to 3,000 messages/sec with batching (300/sec without) |
| Name suffix | any name | must end in .fifo |
| Use when | Throughput and scale matter most | Order and no-duplicates matter most |
Use Standard for the image-resize example, log processing, or anything where occasional duplicate or out-of-order delivery is harmless. Use FIFO for things like a bank’s transaction queue or a sequence of “create order, then cancel order” events where order and uniqueness are critical.
Gotcha: Standard queues will occasionally deliver a duplicate message and will sometimes reorder messages. This is not a bug — it is how the at-least-once model achieves its scale. Always design your consumer to be idempotent (processing the same message twice produces the same result, e.g. by recording a processed-message ID and skipping repeats).
Visibility timeout
When a consumer reads a message, SQS does not delete it. Instead it hides the message from other consumers for a set period called the visibility timeout (default 30 seconds). The consumer is expected to finish its work and then explicitly delete the message. If it deletes the message in time, the message is gone for good. If the timeout expires first, SQS assumes the consumer died and makes the message visible again so another consumer can try.
Critical gotcha: Your visibility timeout MUST be longer than your worst-case processing time. If a job takes 90 seconds but the timeout is 30 seconds, the message reappears mid-job and a second worker starts processing it — the same work runs twice. Set the timeout to roughly six times your expected processing time, and combine it with idempotent consumers so a double-delivery is harmless.
Dead-letter queues (DLQs)
A dead-letter queue is a second queue that catches messages your consumers cannot process. You configure the main queue with a maxReceiveCount — say 5. If a message is received 5 times without being deleted (because it keeps causing errors), SQS automatically moves it to the DLQ. This stops a single bad (“poison”) message from looping forever and blocking the queue, and gives you a place to inspect and debug failures.
When to use a DLQ: always, on any production queue. It is your safety net for malformed input, bugs, and downstream outages.
Creating a queue
Console steps:
- Open the SQS console at console.aws.amazon.com/sqs.
- Choose Create queue.
- Select the type: Standard or FIFO.
- Enter a name (FIFO names must end in
.fifo, e.g.orders.fifo). - Under Configuration, set Visibility timeout (e.g. 120 seconds) and Message retention (default 4 days, max 14).
- Under Dead-letter queue, enable it, pick an existing queue, and set Maximum receives to 5.
- Choose Create queue.
AWS CLI v2 equivalent:
# Create the dead-letter queue first
aws sqs create-queue --queue-name image-jobs-dlq
# Create the main queue with visibility timeout and a DLQ redrive policy
aws sqs create-queue \
--queue-name image-jobs \
--attributes '{
"VisibilityTimeout": "120",
"MessageRetentionPeriod": "1209600",
"RedrivePolicy": "{\"deadLetterTargetArn\":\"arn:aws:sqs:us-east-1:111122223333:image-jobs-dlq\",\"maxReceiveCount\":\"5\"}"
}'
Output:
{
"QueueUrl": "https://sqs.us-east-1.amazonaws.com/111122223333/image-jobs"
}
Sending, receiving, and deleting
# Send a message
aws sqs send-message \
--queue-url https://sqs.us-east-1.amazonaws.com/111122223333/image-jobs \
--message-body '{"imageId":"img-0a1b2c3d","action":"resize"}'
# Receive up to 10 messages, waiting up to 20s (long polling — cheaper, fewer empty reads)
aws sqs receive-message \
--queue-url https://sqs.us-east-1.amazonaws.com/111122223333/image-jobs \
--max-number-of-messages 10 \
--wait-time-seconds 20
Output:
{
"Messages": [
{
"MessageId": "5fea7756-0ea4-451a-a703-a558b933e274",
"ReceiptHandle": "AQEBzWwaftRI0KuVm4tP+/7q1...",
"Body": "{\"imageId\":\"img-0a1b2c3d\",\"action\":\"resize\"}"
}
]
}
After your worker finishes the job, delete the message using its ReceiptHandle (a one-time token, not the message ID):
aws sqs delete-message \
--queue-url https://sqs.us-east-1.amazonaws.com/111122223333/image-jobs \
--receipt-handle "AQEBzWwaftRI0KuVm4tP+/7q1..."
A common modern pattern is to skip writing a polling worker entirely: point an AWS Lambda function at the queue as an event source, and Lambda polls, invokes your function in batches, and deletes successfully processed messages automatically.
Cost note: SQS charges per request (each API call counts), with the first 1 million requests/month free. Standard requests are about $0.40 per million; FIFO about $0.50 per million. Always use long polling (
--wait-time-seconds 20) — short polling fires many empty receive calls that you pay for and that add nothing.
Best practices
- Make every consumer idempotent so a duplicate or replayed message causes no harm.
- Set the visibility timeout to about 6x your expected processing time; for long jobs, extend it on the fly with
change-message-visibility. - Always attach a dead-letter queue with a sensible
maxReceiveCount(5 is a good default). - Use long polling everywhere to cut cost and reduce empty responses.
- Choose FIFO only when you truly need order or exactly-once — Standard scales far higher and costs less.
- Process messages in batches (up to 10) to reduce request count and cost.
- Enable server-side encryption (SSE) for messages containing sensitive data.