DevOps Interview Questions
DevOps interviews rarely test one tool. They probe whether you understand the whole flow — from a developer’s commit, through automated testing and building, out to a running server, and back again through monitoring. This page collects the questions that come up again and again across culture, CI/CD, Infrastructure as Code, containers, monitoring, and real-world scenarios. Each answer is written the way you would actually want to say it out loud: plain, honest, and specific. Read them, then practise saying them in your own words — interviewers can spot a memorised script instantly.
Culture and fundamentals
What is DevOps, in one sentence?
DevOps is a culture and set of practices that shortens the time between writing code and that code safely running in production, by getting the people who build software and the people who run it to work as one team with shared goals and heavy automation. It is not a job title or a single tool. The keyword interviewers listen for is shared ownership: developers care about how their code runs in production, and operations people care about how it gets built.
What problem was DevOps invented to solve?
The old “throw it over the wall” model. Developers wrote code and handed it to a separate operations team to deploy. Developers wanted fast change; operations wanted stability; so they fought. DevOps removes that wall. Both sides share one goal — frequent, reliable releases — and automation replaces the slow, error-prone manual hand-off.
What is “shift left”?
“Shift left” means moving checks earlier in the process (further to the left on a timeline that runs left-to-right). Instead of finding a security hole or a failing test just before release, you run those checks the moment code is written. Catching problems early is cheaper and faster than catching them late.
What are the “three ways” or core principles you optimise for?
A clean answer: fast flow (code moves quickly from commit to production), fast feedback (you learn about failures immediately via tests and monitoring), and continuous learning (blameless reviews of incidents so the system improves, not so a person gets punished). The word blameless matters — punishing people just teaches them to hide mistakes.
CI/CD
What is the difference between Continuous Delivery and Continuous Deployment?
Both build a release-ready artifact (the packaged, deployable output — a Docker image, a .jar, a zip). The single difference is who pushes the button. In Continuous Delivery, a human approves the final push to production. In Continuous Deployment, the moment all tests pass, a machine deploys automatically with no human in the loop. Deployment only makes sense when you deeply trust your test suite and can roll back instantly.
What does a typical CI pipeline do, step by step?
checkout code → install dependencies → build → run unit + integration tests
→ run linters and security scans → publish artifact → (deploy)
The goal is that every single commit is automatically built and tested, so a broken change is caught within minutes instead of days.
How do you keep a pipeline fast?
| Technique | What it does |
|---|---|
| Caching dependencies | Skip re-downloading the same packages every run |
| Running tests in parallel | Split a 20-minute suite across several runners |
| Only building changed parts | Skip work for untouched code (monorepo tooling) |
| Failing fast | Run quick linters before slow integration tests |
Rule of thumb: if a CI run takes more than ~10 minutes, developers stop running it often, which defeats the whole point. Treat pipeline speed as a feature.
Infrastructure as Code
What is Infrastructure as Code (IaC), and why use it?
Infrastructure as Code means describing your servers, networks, and cloud resources in text files (code) instead of clicking around in a web console by hand. You commit those files to Git. Benefits: changes are reviewable, repeatable, and version-controlled, and you can rebuild your whole environment from scratch identically. Clicking in a console is fast once but impossible to reproduce or audit.
What is the difference between declarative and imperative IaC?
Declarative means you describe the desired end state (“I want 3 web servers”) and the tool figures out how to get there — Terraform works this way. Imperative means you write the exact steps in order (“create server, then attach disk, then…”). Declarative is usually preferred because the tool can compare current state to desired state and only change what is needed.
What is “idempotency” and why does it matter?
Idempotent means running the same operation many times leaves the system in the same state as running it once. A good IaC or configuration-management run (Ansible, Terraform) should be idempotent: re-running it on an already-correct server changes nothing. This lets you safely re-apply configuration without fear of duplicating or breaking things.
Containers and orchestration
What is the difference between a container and a virtual machine?
| Container | Virtual machine | |
|---|---|---|
| What it virtualises | The operating system / process space | The whole hardware |
| Includes its own OS kernel? | No — shares the host kernel | Yes — full guest OS |
| Start time | Milliseconds to seconds | Tens of seconds to minutes |
| Size | Megabytes | Gigabytes |
| Isolation strength | Strong, but less than a VM | Strongest |
A container packages your app plus its dependencies into one lightweight unit that runs the same on any machine. It is lighter than a VM because it shares the host’s Linux kernel instead of booting its own.
What problem does Kubernetes solve?
Running one container is easy. Running hundreds across many servers — restarting crashed ones, spreading load, rolling out new versions without downtime, and scaling up under traffic — is hard. Kubernetes is an orchestrator: it automates that. You tell it the desired state (“run 5 copies of this image”) and it keeps reality matching, rescheduling containers when a server dies.
What is the difference between a Docker image and a container?
An image is the static, read-only template (the recipe). A container is a running instance of that image (the cake you baked from it). One image can start many containers.
Monitoring and reliability
What is the difference between monitoring and observability?
Monitoring answers questions you already knew to ask — “is CPU over 90%?” — using pre-defined dashboards and alerts. Observability is the ability to ask new questions about your system from its outputs (logs, metrics, traces) without shipping new code, so you can debug problems you did not predict. The “three pillars” are metrics (numbers over time), logs (text events), and traces (the path of one request across services).
What are SLI, SLO, and SLA?
- SLI (Service Level Indicator) — a measured number, e.g. “99.95% of requests succeeded this month.”
- SLO (Service Level Objective) — your internal target for that number, e.g. “99.9% success.”
- SLA (Service Level Agreement) — a contract with customers, with penalties if you miss it.
Order of strictness: your SLO should be tighter than your SLA, so you have a buffer before breaking a promise.
Scenario questions
A deploy went out and the site is now down. Walk me through what you do.
A strong answer prioritises recovery before diagnosis:
- Roll back first. Restore service for users — revert to the last known-good version. Do not debug a live outage.
- Communicate. Tell stakeholders the site is down and being worked on.
- Check the obvious signals — error-rate graphs, recent log entries, the deployment diff.
journalctl -u myapp.service --since "10 minutes ago" -p err
sudo systemctl status myapp.service
Output:
× myapp.service - My Web App
Active: failed (Result: exit-code) since Mon 2026-06-15 14:02:11 UTC
Main PID: 4821 (code=exited, status=1/FAILURE)
Jun 15 14:02:11 web01 myapp[4821]: FATAL: could not connect to database "app"
- Find the root cause, fix it, and add a test or alert so it cannot happen silently again.
- Run a blameless post-mortem — fix the system, not the person.
How would you give a developer temporary access to read logs on a production Ubuntu server without making them root?
Add them to the adm group, which on Ubuntu can read /var/log without full root power. This follows the principle of least privilege — grant the minimum access needed, nothing more.
sudo usermod -aG adm alice
Gotcha: never hand out
sudoor the root password for a read-only task. Over-broad access is one of the most common causes of accidental production outages and security incidents.
Best practices for the interview itself
- Answer the “why”, not just the “what” — anyone can name a tool; show you understand the problem it solves.
- Use real numbers and real commands from your own experience; specifics signal genuine hands-on work.
- When you do not know something, say so, then reason out loud about how you would find out.
- Always mention rollback and monitoring when discussing deployments — it shows production maturity.
- Bring up “blameless” and “least privilege” naturally; interviewers treat them as culture signals.
- Tie everything back to the core DevOps goal: shipping changes fast and safely.