Text Processing with sed & awk
When you administer Linux servers you spend a huge amount of time reading and editing text: config files, log files, and the output of other commands. Two classic tools make this fast and repeatable. sed (short for “stream editor”, a tool that edits text as it flows through) is best for find-and-replace and line deletion. awk (named after its three authors, Aho, Weinberger, and Kernighan) is best for pulling out columns and building small reports. Both come pre-installed on Ubuntu 22.04 and 24.04 LTS, so there is nothing to install.
sed vs awk — when to use which
Both tools read text line by line, but they are good at different jobs. Reach for the right one and your scripts stay short and clear.
| Task | Use | Why |
|---|---|---|
| Replace text in a file or stream | sed | One short s/old/new/ command |
| Delete or print specific lines | sed | Line addressing is built in |
| Edit a config file in place | sed -i | Writes changes back to the file |
| Print column 3 from a log | awk | Splits each line into fields automatically |
| Sum or count values in a report | awk | Has variables, math, and END blocks |
| Filter rows by a condition | awk | awk '$3 > 100' reads like a sentence |
A simple rule: if you are changing text, start with sed; if you are extracting or calculating from columns, start with awk.
Find and replace with sed
The core of sed is the substitute command, written s/old/new/. The s means substitute, the text between the first and second slash is what to find, and the text between the second and third slash is the replacement. By default it only changes the first match on each line. Add the g flag (for “global”) to change every match on the line.
echo "cat dog cat" | sed 's/cat/bird/'
Output:
bird dog cat
Now with the global flag so both cat words change:
echo "cat dog cat" | sed 's/cat/bird/g'
Output:
bird dog bird
You can also run sed on a whole file. This reads app.conf and prints the result to the screen without touching the file on disk:
sed 's/localhost/127.0.0.1/g' app.conf
The slash
/is just the most common delimiter, not a magic one. When your text contains slashes (like file paths) use a different separator to avoid escaping every slash:sed 's#/var/www#/srv/www#g' nginx.conf. Any character after thesbecomes the delimiter.
Editing a config file in place with sed -i
Printing to the screen is safe for testing, but eventually you want to actually save the change. The -i flag means “in place” — it writes the edited text back into the original file. When to use this: automating a config change across many servers, or in a provisioning script where you cannot open an editor by hand. When NOT to: on a file you have not backed up, because -i overwrites it immediately.
The safe habit is to make a backup at the same time. Adding a suffix after -i tells sed to save the original with that suffix first.
Say /etc/ssh/sshd_config contains the line #PasswordAuthentication yes and you want to turn password logins off. Run:
sudo sed -i.bak 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
Here -i.bak saves the original as sshd_config.bak before editing. The pattern ^#*PasswordAuthentication.* matches the line whether or not it starts with # (the ^ anchors to the start of the line, #* allows zero or more #, and .* matches the rest). After editing, apply the change:
sudo systemctl restart ssh
Always keep a backup when using
sudo sed -ion files under/etc. A bad pattern can silently break a service. The.bakcopy lets you restore withsudo mv /etc/ssh/sshd_config.bak /etc/ssh/sshd_config.
Deleting lines with sed
sed can also remove lines. The d command deletes lines that match a pattern or a line number.
Delete every blank line from a file:
sed '/^$/d' messy.conf
The pattern /^$/ means a line with nothing between its start (^) and end ($) — that is, an empty line. To delete all comment lines (lines starting with #):
sed '/^#/d' app.conf
You can also target line numbers. This deletes the first line (useful for stripping a header):
sed '1d' data.csv
Extracting columns with awk
awk automatically splits each line into fields separated by whitespace. You refer to them as $1, $2, $3, and so on. $0 means the whole line. This makes pulling a single column trivial.
Imagine an Nginx access log line in /var/log/nginx/access.log:
203.0.113.5 - - [15/Jun/2026:10:22:01 +0000] "GET /home HTTP/1.1" 200 1024
The first field ($1) is the visitor’s IP address. To print every IP that hit your server:
awk '{print $1}' /var/log/nginx/access.log
Output:
203.0.113.5
198.51.100.7
203.0.113.5
To find the most frequent visitors, pipe that into sort and uniq:
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head
Output:
42 203.0.113.5
19 198.51.100.7
8 192.0.2.44
Choosing a different field separator
Not all files use spaces. CSV files (comma-separated values) use commas. The -F flag sets the separator. To print the email column (field 2) from a comma-separated file:
awk -F',' '{print $2}' users.csv
For files separated by colons, like /etc/passwd, use -F':'. This prints each username (the first field):
awk -F':' '{print $1}' /etc/passwd
Filtering and simple reports with awk
awk shines when you add a condition. Put a test before the { } block and only matching lines run it. This prints only log lines where the HTTP status code (field 9 in the log above) is 404:
awk '$9 == 404 {print $7}' /var/log/nginx/access.log
That gives you the list of missing URLs people requested. You can also do math across all lines using a special END block, which runs once after the last line. This sums the bytes-sent column (field 10) to report total traffic:
awk '{total += $10} END {print "Total bytes:", total}' /var/log/nginx/access.log
Output:
Total bytes: 89231044
Here total += $10 adds each line’s value to a running variable, and END prints the final sum. When to use this: quick one-off reports straight from a log, before reaching for heavier monitoring tools.
Best Practices
- Test
sedwithout-ifirst; once the screen output looks right, add-i.bakto save safely. - Always keep a backup (
-i.bak) when editing files under/etc, and verify the service still starts afterward. - Use a non-slash delimiter (
s#a#b#) when your text contains file paths to avoid messy escaping. - Anchor patterns with
^and$so you match exactly the line you mean, not a substring elsewhere. - Pick
awkfor columns and math,sedfor substitution and deletion — combining them in a pipe is often cleaner than forcing one tool to do everything. - Quote your
sedandawkprograms in single quotes so the shell does not expand$1,$2, and other symbols before the tool sees them.