What is Prometheus?

Prometheus is an open-source system that collects numbers about your servers and applications over time and stores them so you can ask questions like “how much memory is this server using right now?” or “how many errors did my app have in the last hour?”. It was built at SoundCloud in 2012, became the second project to graduate from the Cloud Native Computing Foundation (CNCF) after Kubernetes, and is now the default monitoring tool in almost every modern infrastructure stack. If you run servers on Ubuntu and want to know what they are doing, Prometheus is where most engineers start.

What problem does Prometheus solve?

When you run a server, lots of things change every second: CPU usage, free memory, disk space, network traffic, and request counts. Without a tool watching these, you only find out about problems when something breaks and a user complains. Prometheus solves this by continuously recording these numbers (called metrics) so you can graph them, set up alerts, and spot trouble before it becomes an outage.

A metric is just a named number measured over time, for example node_memory_MemFree_bytes (free memory in bytes). Prometheus stores every measurement together with the exact moment it was taken. That pairing of a number and a timestamp is called a time series (a sequence of values recorded at regular points in time), and Prometheus is a time-series database (TSDB) built specifically to handle millions of them efficiently.

When to use Prometheus (and when not to)

Use case	Good fit for Prometheus?
Numeric metrics from servers, containers, and apps	Yes, this is its core job
Alerting on thresholds (high CPU, app down)	Yes, via Alertmanager
Kubernetes and cloud-native monitoring	Yes, it is the standard
Storing log lines or text messages	No, use Loki or the ELK stack
Storing request traces across services	No, use Tempo or Jaeger
Long-term storage for years of data	Partly, pair it with Thanos or Cortex

Prometheus is for metrics (numbers), not logs or traces. A complete observability setup usually combines all three. See the metrics, logs, and traces page linked below.

The pull-based model

Most older monitoring tools are push-based: each server runs an agent that sends (“pushes”) its data to a central collector. Prometheus does the opposite. It uses a pull model, meaning the Prometheus server reaches out and fetches the data itself on a schedule. This act of fetching is called scraping.

Every target you want to monitor exposes its metrics on an HTTP endpoint, almost always /metrics. On a schedule (by default every 15 seconds), Prometheus makes an HTTP request to that endpoint, reads the current values, and stores them. The data on the endpoint looks like plain text:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 8423.51
node_memory_MemFree_bytes 1.048576e+09

The pull model has real advantages. Prometheus always knows the full list of targets it should be scraping, so if one stops responding, Prometheus instantly knows that target is down and can alert you. You can also open any target’s /metrics URL in a browser to debug it by hand, with no extra tooling.

Exporters: getting metrics out of anything

Prometheus only knows how to read that /metrics text format. So how do you monitor things that do not speak it, like the Linux kernel, PostgreSQL, or Nginx? You run a small helper program called an exporter (a tiny service that reads stats from one specific system and republishes them in Prometheus format).

The most common one is Node Exporter, which exposes Linux server metrics such as CPU, memory, disk, and network. There are official and community exporters for almost everything: postgres_exporter, nginx-prometheus-exporter, blackbox_exporter for uptime checks, and many more. Your own apps can expose metrics directly using a Prometheus client library (available for Go, Python, Java, Node.js, and others), so they do not need an exporter at all.

PromQL: asking questions about your data

Storing metrics is only useful if you can query them. Prometheus has its own query language called PromQL (Prometheus Query Language). You type expressions into the Prometheus web interface or into Grafana, and PromQL returns the matching time series.

A few examples to show the idea:

# Free memory right now, in bytes
node_memory_MemFree_bytes

# Per-second rate of HTTP requests over the last 5 minutes
rate(http_requests_total[5m])

# Average CPU usage across all servers, as a percentage of idle time removed
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

The rate() function is one you will use constantly: it turns an ever-growing counter (like total requests) into a useful “per second” value. Labels in curly braces (like {mode="idle"}) let you filter and group data, which is what makes Prometheus so flexible.

How the pieces fit together

A typical Prometheus setup on Ubuntu has these parts:

Component	What it does
Prometheus server	Scrapes targets, stores time series, runs PromQL
Exporters (e.g. Node Exporter)	Expose metrics for systems that cannot do it themselves
Alertmanager	Receives alerts from Prometheus and sends emails, Slack, PagerDuty
Grafana	A separate tool for building dashboards on top of Prometheus data

Prometheus runs as a single binary with no external database to install, which is a big reason it is so popular. On Ubuntu 22.04/24.04 LTS you install it, point it at your targets in a YAML config file, and run it as a systemd service. The next page walks through that install step by step.

Why Prometheus became the standard

Several things made Prometheus win:

It is free and open-source, with no licensing cost as you grow.
It has a simple, well-documented data model based on labels.
The pull model makes failed targets obvious and easy to debug.
It is the native monitoring tool for Kubernetes, so cloud adoption pulled it along.
A huge ecosystem of exporters means you can monitor almost anything quickly.

Best Practices

Always pair Prometheus with Grafana for graphs; the built-in web UI is for quick queries, not polished dashboards.
Use labels consistently (for example env="prod", instance="web-1") so you can filter and group cleanly later.
Run Prometheus, exporters, and Alertmanager as systemd services so they restart automatically on reboot or crash.
Keep scrape intervals reasonable (15s is a sensible default); scraping too fast wastes CPU and disk for little benefit.
Restrict access to Prometheus and exporter ports with ufw, since /metrics endpoints can leak system details to anyone who can reach them.
For long retention, add Thanos or remote storage rather than keeping years of data on one local disk.