Navigation

AWS aws monitoring 6 min read

CloudWatch Logs

When your code runs in the cloud, you cannot just open a terminal on the server and read its output. CloudWatch Logs is the AWS (Amazon Web Services) service that collects, stores, and lets you search the log lines your applications and infrastructure produce. It is where almost every AWS service sends its logs by default, so learning it well is one of the highest-leverage skills for debugging anything in the cloud. This page explains how logs are organized, how to ship logs from common sources, how to query them efficiently, and two cost traps that catch nearly everyone.

How logs are organized: groups and streams

CloudWatch Logs has two levels of structure.

A log group is a named container, usually one per application or service. For example, every AWS Lambda function gets a log group named /aws/lambda/<function-name>. Retention, access permissions, and metric filters are all set at the log group level.
A log stream is a sequence of log events that all came from the same source, such as one running container, one EC2 (Elastic Compute Cloud, a virtual server) instance, or one Lambda execution environment. Streams live inside a group.

You read and query at the group level, and CloudWatch handles merging the streams for you. You rarely create streams by hand; the agent or service does it.

Cost gotcha: New log groups default to Never expire. If you never set a retention period, logs pile up forever and you keep paying storage costs (about $0.03 per GB per month) on data you will never read. Set a retention policy on every log group.

Sending logs to CloudWatch

From AWS Lambda

Lambda is the easiest case: anything your function writes to stdout/stderr (a console.log, print, or logger call) is captured automatically and sent to /aws/lambda/<function-name>. You only need to make sure the function’s IAM (Identity and Access Management) execution role has the logs:CreateLogGroup, logs:CreateLogStream, and logs:PutLogEvents permissions. The default AWSLambdaBasicExecutionRole managed policy already grants these.

When to use this: always — there is nothing to install. Just be sure to set retention on the auto-created group (see below), because Lambda creates it with “Never expire”.

From EC2 instances and on-prem servers

EC2 does not ship application logs on its own. You install the CloudWatch agent, point it at the files you care about (for example /var/log/nginx/access.log), and it tails them into a log group you name.

In the EC2 console, attach an IAM role with the CloudWatchAgentServerPolicy to the instance.
Install the agent (it ships in Amazon Linux 2023 repos): sudo dnf install -y amazon-cloudwatch-agent.
Create a config file describing which files map to which log group, then start the agent.

The full agent walkthrough lives on the CloudWatch agent page.

From containers (ECS / EKS / Fargate)

For ECS (Elastic Container Service) and Fargate, set the awslogs log driver in your task definition and the container’s stdout/stderr flows straight into CloudWatch:

"logConfiguration": {
  "logDriver": "awslogs",
  "options": {
    "awslogs-group": "/ecs/orders-api",
    "awslogs-region": "us-east-1",
    "awslogs-stream-prefix": "orders"
  }
}

For EKS (Elastic Kubernetes Service), the common pattern is Fluent Bit running as a DaemonSet, configured to forward container logs to CloudWatch.

Sending a one-off log line with the CLI

Useful for scripts or testing. You must create the group and stream first, then push events.

aws logs create-log-group --log-group-name /app/batch-jobs
aws logs create-log-stream \
  --log-group-name /app/batch-jobs \
  --log-stream-name nightly-2026-06-15
aws logs put-log-events \
  --log-group-name /app/batch-jobs \
  --log-stream-name nightly-2026-06-15 \
  --log-events timestamp=$(($(date +%s)*1000)),message="Job started"

Output:

{
    "nextSequenceToken": "49039859812345678901234567890123456789012345678901234567"
}

Setting retention (do this every time)

Pick a retention period that matches how long you actually need the logs. Common choices: 7 or 14 days for chatty debug logs, 30-90 days for application logs, longer for anything you need for audits.

Console steps:

Open the CloudWatch console and choose Logs > Log groups.
Select the log group, then Actions > Edit retention setting.
Choose a period (for example 30 days) and save.

CLI equivalent:

aws logs put-retention-policy \
  --log-group-name /aws/lambda/orders-api \
  --retention-in-days 30

Tip: Make retention a default, not a chore. Set an AWS Config rule or an EventBridge rule that fires whenever a new log group is created and applies a retention policy automatically.

Querying with Logs Insights

CloudWatch Logs Insights is a query language for searching across log groups. It is far faster than scrolling streams by hand and supports filtering, aggregation, and parsing of structured (JSON) logs.

A query to find the 20 slowest requests over 1 second:

fields @timestamp, @message, duration
| filter duration > 1000
| sort duration desc
| limit 20

A query to count errors grouped by type from JSON logs:

filter level = "ERROR"
| stats count(*) as errors by errorType
| sort errors desc

Console steps:

CloudWatch console > Logs > Logs Insights.
Select one or more log groups, set the time range in the top-right.
Type the query and choose Run query.

Cost gotcha: Logs Insights bills by the amount of data scanned (about $0.005 per GB), not by the result size. A broad query over a huge time range scans everything in that window. Always narrow the time range and select only the specific log groups you need before running. Add a filter early in the query to cut down what later commands process.

Metric filters: turn log patterns into metrics and alarms

A metric filter watches a log group for a pattern and increments a CloudWatch metric each time it matches. You can then build an alarm on that metric — for example, alert when “ERROR” appears more than 10 times in 5 minutes. This bridges unstructured logs into the numeric world of CloudWatch metrics and alarms.

aws logs put-metric-filter \
  --log-group-name /aws/lambda/orders-api \
  --filter-name error-count \
  --filter-pattern "ERROR" \
  --metric-transformations \
      metricName=ErrorCount,metricNamespace=OrdersApi,metricValue=1,defaultValue=0

When to use a metric filter vs. Logs Insights:

Need	Use
Real-time alerting on a known pattern	Metric filter + alarm
Continuous dashboards / time series	Metric filter
Ad-hoc investigation and root-cause analysis	Logs Insights
Aggregating across many fields once	Logs Insights

Metric filters only apply to logs ingested after the filter is created — they do not backfill historical data.

Best practices

Set a retention policy on every log group the moment it is created; never leave the default “Never expire”.
Use structured JSON logging in your apps so Logs Insights can filter on individual fields instead of regex over raw text.
Narrow the time range and log group before running Logs Insights queries to control data-scanned costs.
Create metric filters and alarms for known failure signatures (errors, timeouts, throttles) instead of relying on people watching logs.
Grant log access through scoped IAM policies; do not give broad logs:* permissions to applications.
For high-volume, long-term storage, export logs to S3 (Simple Storage Service) where storage is cheaper than CloudWatch’s retained-log pricing.
Consistently name log groups by service (/aws/lambda/..., /ecs/..., /app/...) so they are easy to find and target in queries.