DevOps vs SRE vs Platform Engineering

When you start learning DevOps, you quickly run into three terms that sound like they compete: DevOps, SRE (Site Reliability Engineering), and Platform Engineering. They are not rivals. They are three different answers to the same question — “how do we ship software fast without breaking things?” This page explains what each one really means, how they overlap, and which job title actually does what, so you can read job adverts and team docs without getting confused.

The one-paragraph summary

DevOps is a culture and a set of practices for getting developers (Dev) and operations (Ops) to work as one team. SRE is Google’s specific, opinionated way of doing DevOps, built around hard numbers like error budgets. Platform Engineering is the discipline of building an internal product (a “platform”) that makes DevOps easy for every developer in the company. Think of it like this: DevOps is the goal, SRE is one proven method, and Platform Engineering is the tooling team that helps everyone get there.

DevOps, SRE, and Platform Engineering compared

Aspect	DevOps	SRE (Site Reliability Engineering)	Platform Engineering
What it is	A culture + practices	A concrete implementation of DevOps	A product team building internal tools
Origin	Industry movement (~2009)	Google (book published 2016)	Grew popular ~2020–2022
Main focus	Collaboration, fast and safe delivery	Reliability measured with numbers	Developer experience (DX) and self-service
Key idea	”You build it, you run it”	Error budgets and SLOs	Internal Developer Platform (IDP)
Success measured by	Deployment frequency, lead time, failure rate	SLO compliance, error budget burn	Adoption, time-to-first-deploy
Who they serve	The whole org	The service’s reliability	Other engineers (their “customers”)
Typical job title	DevOps Engineer	Site Reliability Engineer	Platform Engineer

What is DevOps?

DevOps is a way of working where the people who write software and the people who run it in production share the same goals, tools, and on-call duties. Before DevOps, developers threw code “over the wall” to a separate operations team, and the two groups blamed each other when things broke.

DevOps fixes this with practices like automated testing, CI/CD (Continuous Integration / Continuous Delivery — automatically building, testing, and shipping code), Infrastructure as Code (managing servers with config files instead of clicking buttons), and shared monitoring.

When to use this framing: Always. DevOps is the umbrella idea. Almost every modern software team is “doing DevOps” to some degree.

What is SRE?

SRE stands for Site Reliability Engineering. It is the name Google gave to its own engineering approach, made famous by the free Google SRE book. The famous one-liner from Google is: “SRE is what happens when you ask a software engineer to design an operations team.”

The headline idea is that you should not aim for 100% reliability — that is impossibly expensive and slows you down. Instead you set a target and measure against it.

SLI, SLO, and error budgets

SLI (Service Level Indicator): a number you actually measure, like “the percentage of requests that succeeded.” For example, 99.95% of requests returned without an error.
SLO (Service Level Objective): the target for that number, like “99.9% of requests must succeed over 30 days.” This is your internal goal.
Error budget: the amount of failure you are allowed. If your SLO is 99.9%, you are allowed 0.1% failure. That 0.1% is your budget to spend on risky deploys, experiments, and the occasional outage.

The clever part: when the error budget is used up, the team stops shipping new features and focuses on stability instead. This turns an argument (“ship faster” vs “be more stable”) into a simple math rule that both developers and operations agree on in advance.

Gotcha: An SLO is not the same as an SLA (Service Level Agreement). An SLA is a contract with your customer that has financial penalties (e.g. refunds) if you miss it. Your internal SLO should always be stricter than your public SLA, so you notice and fix problems before they ever break the contract.

When to use SRE practices: When you run services where downtime genuinely costs money or trust, and you have enough traffic to measure reliability meaningfully. When NOT to: A tiny side project with five users does not need error budgets — that is over-engineering.

What is Platform Engineering?

Platform Engineering is the practice of building an Internal Developer Platform (IDP) — a self-service product that other engineers in the company use to deploy and run their apps without needing to be infrastructure experts.

The motivation: pure “you build it, you run it” DevOps can overload developers. Suddenly every developer needs to know Kubernetes, networking, secrets, monitoring, and more. Platform teams package all that complexity behind a clean interface — a CLI, a web portal (often built with tools like Backstage), or a set of templates — so a developer can ship a new service in minutes.

The platform team treats its internal tools as a product, with the other engineers as customers. Success is measured by adoption and how quickly a new developer can deploy their first service.

When to use this: Once an organisation has many teams repeating the same infrastructure work, a dedicated platform pays off. When NOT to: A single small team does not need a platform team — they would just be building tools for themselves.

How they fit together

These three are layers, not competitors:

DevOps sets the cultural goal: fast, safe, collaborative delivery.
SRE provides a measurable method (SLOs and error budgets) to balance speed and stability.
Platform Engineering builds the tooling that makes both of the above easy for everyone.

A healthy company can have all three at once. The platform team might build the deployment pipelines, the SRE team defines the reliability targets, and every team practises the DevOps culture of owning their code in production.

A quick way to tell job roles apart

If the role mostly…	It is probably…
Builds CI/CD pipelines and automates ops for a product team	DevOps Engineer
Defines SLOs, manages error budgets, runs incident response	Site Reliability Engineer
Builds internal tools/portals used by other engineers	Platform Engineer

In practice, real job titles blur these lines — a “DevOps Engineer” advert often describes platform work, and SRE roles often include heavy automation. Read the responsibilities, not just the title.

Best Practices

Treat DevOps as the culture, not a job title or a single tool — it is how teams collaborate, not a product you can buy.
Start measuring reliability with SLOs before you adopt full SRE; even one good SLO per critical service is a huge step up from guessing.
Keep your internal SLO stricter than your public SLA so you have a safety margin and time to react.
Only build an Internal Developer Platform when repeated infrastructure pain across teams justifies it — do not build a platform for a single team.
Give platform teams real product discipline: gather feedback from their engineer “customers” and measure adoption, not just features shipped.
Use the error budget as a shared, objective rule to settle the eternal “speed vs stability” debate before it becomes an argument.
Do not chase 100% uptime — it is the wrong target, costing far more than it is worth and blocking healthy change.