Monitoring System Resources
When a server feels slow, the first thing you do is look at what it is actually doing right now. Is the CPU (the processor, the chip that runs your programs) pegged at 100%? Is it out of memory (RAM, the fast short-term storage where running programs live)? Is the disk overloaded with read and write requests? This page teaches you the handful of built-in tools that answer those questions in seconds. Think of it as your first-response triage kit for a misbehaving Ubuntu server.
The four questions of triage
Before reaching for any tool, you are trying to answer four things, in order:
- Is the CPU the bottleneck? Are processes fighting over the processor?
- Is memory the bottleneck? Is the server swapping (pushing memory to disk because RAM is full)?
- Is the disk the bottleneck? Are reads and writes piling up?
- Which process is responsible?
Each tool below answers one or more of these.
top — the always-available live view
top shows a live, refreshing list of running processes sorted by CPU usage. It is installed on every Linux system, so it is the one tool you can always count on, even on a bare server.
top
Output:
top - 14:22:07 up 9 days, 3:11, 2 users, load average: 1.42, 0.98, 0.71
Tasks: 213 total, 1 running, 212 sleeping, 0 stopped, 0 zombie
%Cpu(s): 18.3 us, 4.1 sy, 0.0 ni, 76.9 id, 0.5 wa, 0.0 hi, 0.2 si, 0.0 st
MiB Mem : 7951.4 total, 412.6 free, 3120.8 used, 4418.0 buff/cache
MiB Swap: 2048.0 total, 1980.0 free, 68.0 used. 4502.1 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1834 postgres 20 0 421560 98220 86440 S 9.6 1.2 2:14.08 postgres
2201 www-data 20 0 712340 142880 18220 S 6.3 1.8 0:51.33 nginx
How to read the header line by line:
- load average — three numbers for the last 1, 5, and 15 minutes (covered in detail below).
- %Cpu(s) — the key fields are
us(user, time spent on your apps),sy(system, time spent in the kernel),id(idle, doing nothing), andwa(I/O wait, the CPU waiting on disk or network). Highwameans the disk, not the CPU, is your problem. - MiB Mem — total, free, used, and
buff/cache. Theavail Memfigure is what matters: it is how much memory apps can still claim. Linux deliberately uses spare RAM for cache, so a lowfreenumber is normal and healthy.
Useful keys while top is running:
| Key | What it does |
|---|---|
M | Sort by memory usage |
P | Sort by CPU usage (the default) |
k | Kill a process (it asks for the PID) |
1 | Show each CPU core separately |
q | Quit |
When to use it: any time, on any box. When not to: when you want a friendlier, more readable display and htop is available.
htop — the friendly upgrade
htop is top with colour bars, mouse support, and easy scrolling. It is not installed by default, so add it:
sudo apt update
sudo apt install htop
htop
It shows a coloured bar per CPU core, a memory bar, and a swap bar at the top, then a scrollable process list. You can click a column to sort, press F6 to change the sort field, and F9 to kill a process by selecting it instead of typing a PID.
Tip: The green portion of the CPU bar is user time, red is kernel/system time, and blue is low-priority. If the bar is mostly red, something is hammering the kernel — often heavy disk or network I/O.
When to use it: interactive investigation when you have a terminal and can install a package. For scripts or minimal containers, stick with top.
Understanding load average
Load average is the single most misread number in Linux. It is not a percentage. It is the average number of processes that are either running or waiting to run (or waiting on disk). You read it relative to your CPU core count.
Find your core count first:
nproc
Output:
4
Now interpret the three load numbers:
- Load = number of cores → fully busy, no queue. On a 4-core box, a load of
4.0means perfectly saturated. - Load < cores → spare capacity.
- Load > cores → processes are queueing; the server is overloaded.
So a load average of 8.0 is a disaster on a 2-core server but barely warm on a 16-core one. Compare the three numbers to see the trend: if the 1-minute figure is far above the 15-minute figure, load is climbing right now. If the 1-minute is lower, the spike is passing.
vmstat — CPU, memory, and swap over time
vmstat (virtual memory statistics) prints a one-line snapshot of system health. Pass a number to refresh every N seconds; the first line is an average since boot, so ignore it and read the second line onward.
vmstat 2
Output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 69632 422040 102400 4500120 0 0 11 34 210 398 18 4 77 1 0
2 1 69632 418900 102400 4500140 0 12 1840 420 1120 2240 22 6 60 12 0
The columns that matter for triage:
- r — processes waiting for CPU. Consistently above your core count means CPU starvation.
- b — processes blocked waiting on I/O. A non-zero
bpoints at the disk. - si / so — swap in / swap out. Anything other than
0here means you are out of RAM and swapping to disk, which murders performance. This is the clearest “buy more memory” signal. - wa — CPU I/O wait, same as in
top.
iostat — is the disk the problem?
iostat (input/output statistics) shows per-disk activity. It lives in the sysstat package:
sudo apt install sysstat
iostat -x 2
The -x flag adds extended columns. Watch these:
| Column | Meaning | Red flag |
|---|---|---|
%util | How busy the disk is | Near 100% = disk saturated |
await | Average wait per I/O, in ms | High and rising = disk too slow |
r/s, w/s | Reads and writes per second | Context for the above |
If %util sits near 100 and await is high while CPU id (idle) is also high, the CPU is fine and the disk is your bottleneck.
free — a quick memory check
For a fast, one-shot memory read without the live refresh, use free with -h (human-readable units):
free -h
Output:
total used free shared buff/cache available
Mem: 7.8Gi 3.0Gi 402Mi 88Mi 4.3Gi 4.4Gi
Swap: 2.0Gi 68Mi 1.9Gi
Read the available column, not free. As with top, Linux uses idle RAM for buff/cache and hands it back to apps on demand, so a small free value is expected.
Best Practices
- Start triage with load average and
top/htop, then drill into the specific resource that looks wrong. - Always read load average against
nproc— the raw number means nothing on its own. - Treat any non-zero
si/soinvmstatas an out-of-memory alarm, not a minor detail. - Read available memory, never free — cached RAM is reclaimable and counts as available.
- High I/O wait (
wa) with idle CPU means the disk is the bottleneck; confirm it withiostat -x. - Install
htopandsysstaton every server during setup so the tools are ready before an incident. - Use
vmstat 2oriostat 2to watch trends over a few seconds; a single snapshot can mislead you.