Navigation

DevOps projects 6 min read

Project: Set Up Prometheus + Grafana

If you run servers, you eventually need to answer a simple question: is everything healthy right now, and was it healthy an hour ago? This project is your observability capstone. You will install a full monitoring stack on one Ubuntu server, collect live metrics about CPU, memory, and disk, draw a dashboard you can watch, and set up an alert that pings you before a disk fills up. By the end you will have a small but real production-grade setup that you actually understand line by line.

Observability means being able to ask questions about your system from the outside, using the data it emits. We will use three open-source tools that fit together like Lego bricks.

The three pieces and what each does

Prometheus — a time-series database (a database that stores numbers stamped with the exact time they were recorded). It “scrapes” (pulls) metrics from your servers every few seconds and stores them. It is the brain of the stack.
Node Exporter — a tiny program that runs on a server and exposes that server’s hardware and OS metrics (CPU, RAM, disk, network) over HTTP so Prometheus can scrape them. “Node” here just means “one machine”.
Grafana — the dashboard tool. It reads from Prometheus and draws graphs, gauges, and tables that humans can actually look at.

Tool	Role	When to use it
Prometheus	Stores metrics, evaluates alert rules	Always — it is the core
Node Exporter	Exposes machine metrics	On every server you want to watch
Grafana	Visualises and explores metrics	When humans need to see graphs
Alertmanager	Routes alerts to email/Slack	When you want to be notified, not just see

This guide installs everything on a single server for learning. In real production you keep Prometheus and Grafana on a separate monitoring box, and run only Node Exporter on the servers being watched, so a crash on one app server never takes your monitoring down with it.

Step 1 — Install Node Exporter

Node Exporter ships as a single binary. We download it, create a dedicated locked-down user for it, and run it as a systemd service (systemd is Ubuntu’s service manager that starts programs at boot and restarts them if they die).

sudo useradd --no-create-home --shell /bin/false node_exporter
cd /tmp
curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar xvf node_exporter-1.8.2.linux-amd64.tar.gz
sudo cp node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Create the service file:

# /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Start it and confirm it is listening on its default port 9100:

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
curl -s localhost:9100/metrics | head -n 5

Output:

# HELP go_gc_duration_seconds A summary of the wall-time of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.2858e-05
go_gc_duration_seconds{quantile="0.25"} 2.0214e-05
go_gc_duration_seconds{quantile="0.5"} 2.8181e-05

Step 2 — Install Prometheus

sudo useradd --no-create-home --shell /bin/false prometheus
sudo mkdir -p /etc/prometheus /var/lib/prometheus
cd /tmp
curl -LO https://github.com/prometheus/prometheus/releases/download/v2.53.1/prometheus-2.53.1.linux-amd64.tar.gz
tar xvf prometheus-2.53.1.linux-amd64.tar.gz
cd prometheus-2.53.1.linux-amd64
sudo cp prometheus promtool /usr/local/bin/
sudo cp -r consoles console_libraries /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus /usr/local/bin/prometheus

Write the main config. This tells Prometheus to scrape itself and your Node Exporter:

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s        # pull metrics every 15 seconds
  evaluation_interval: 15s    # check alert rules every 15 seconds

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]

Create the systemd service:

# /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus
sudo systemctl status prometheus --no-pager

Output:

● prometheus.service - Prometheus
     Loaded: loaded (/etc/systemd/system/prometheus.service; enabled)
     Active: active (running) since Mon 2026-06-15 10:04:12 UTC; 3s ago

Open http://YOUR_SERVER_IP:9090 in a browser, go to Status → Targets, and both prometheus and node should show UP.

Step 3 — Install Grafana

Grafana has an official apt repository, so installation is clean and gives you automatic updates.

sudo apt-get install -y apt-transport-https software-properties-common
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://apt.grafana.com/gpg.key | sudo gpg --dearmor -o /etc/apt/keyrings/grafana.gpg
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install -y grafana
sudo systemctl enable --now grafana-server

Grafana listens on port 3000. Allow it through the firewall (ufw is Ubuntu’s simple firewall):

sudo ufw allow 3000/tcp

Visit http://YOUR_SERVER_IP:3000 and log in with admin / admin — Grafana forces you to set a new password immediately.

Never leave the default admin/admin password in place, and do not expose ports 9090 (Prometheus) or 9100 (Node Exporter) to the public internet. They have no authentication. Bind them to localhost or lock them down with ufw so only your monitoring server can reach them.

Step 4 — Connect Grafana to Prometheus and build a dashboard

In Grafana: Connections → Data sources → Add data source → Prometheus. Set the URL to http://localhost:9090 and click Save & test. You should see “Successfully queried the Prometheus API”.

Now the easy win: Dashboards → New → Import and enter dashboard ID 1860 (“Node Exporter Full”), select your Prometheus data source, and import. You instantly get dozens of CPU, memory, disk, and network panels.

To understand what is underneath, here are the raw PromQL queries (Prometheus’s query language) that power CPU, memory, and disk panels:

# CPU usage percentage (100 minus idle time)
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory used percentage
(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# Root filesystem used percentage
100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)

Step 5 — Configure a “disk almost full” alert

Alerting is the difference between a pretty graph and a system that protects you. Create an alert rule that fires when the root disk is over 85% full.

# /etc/prometheus/alert_rules.yml
groups:
  - name: disk-alerts
    rules:
      - alert: DiskAlmostFull
        expr: 100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Disk almost full on {{ $labels.instance }}"
          description: "Root filesystem is over 85% full for 5 minutes."

The for: 5m means the condition must stay true for 5 minutes before firing, which avoids noisy alerts from brief spikes. Validate and reload:

promtool check rules /etc/prometheus/alert_rules.yml
sudo systemctl reload prometheus

Output:

Checking /etc/prometheus/alert_rules.yml
  SUCCESS: 1 rules found

To actually deliver the alert to a notification channel (email, Slack), the simplest 2026 approach is Grafana Alerting, which is built into Grafana. Go to Alerting → Contact points → Add contact point, choose Email or Slack, and paste your webhook or SMTP details. Then Alerting → Alert rules → New and reuse the same disk query, pointing it at your contact point. Grafana can read alert state straight from Prometheus, so you get notifications without running a separate Alertmanager for this small setup.

Best Practices

Always run exporters and Prometheus under dedicated, shell-less system users — never as root.
Keep Prometheus, Grafana, and exporters on internal ports closed to the public; reach the Grafana UI through a reverse proxy with TLS instead.
Set a sane data retention with --storage.tsdb.retention.time=30d so disk does not grow forever.
Use for: on every alert rule to suppress flapping, and always test rules with promtool check rules before reloading.
Pin exporter and Prometheus versions in your install scripts so rebuilds are reproducible.
Store your prometheus.yml, alert rules, and Grafana dashboard JSON in Git so the whole stack is recoverable.

Project: Set Up Prometheus + Grafana

The three pieces and what each does

Step 1 — Install Node Exporter

Step 2 — Install Prometheus

Step 3 — Install Grafana

Step 4 — Connect Grafana to Prometheus and build a dashboard

Step 5 — Configure a “disk almost full” alert

Best Practices

Related Topics