Scaling: Vertical vs Horizontal
When your app gets more users, a single server eventually runs out of room. Scaling is how you add capacity so the app keeps responding quickly. There are two main ways to do it: make one server bigger (vertical scaling) or run more servers side by side (horizontal scaling). Knowing which one to reach for, and when, is one of the core skills in Site Reliability Engineering (SRE) — the practice of keeping systems fast and available.
What “scaling” actually means
Scaling means changing how much work your system can handle. Capacity comes from resources like CPU (the processor that runs your code), RAM (memory, where running data lives), disk (storage), and network bandwidth (how much data flows in and out). When demand grows past what your current resources can serve, requests slow down or fail. Scaling adds more of those resources.
There are two directions you can grow.
Vertical scaling (scale up)
Vertical scaling means giving one server more power — more CPU cores, more RAM, faster disk. You keep a single machine but make it bigger. On a cloud provider this is usually just resizing the instance (changing it from, say, a 2-CPU box to an 8-CPU box).
You can check what a server currently has on Ubuntu:
nproc
free -h
df -h /
Output:
4
total used free shared buff/cache available
Mem: 15Gi 3.2Gi 8.1Gi 210Mi 4.1Gi 11Gi
Swap: 2.0Gi 0B 2.0Gi
Filesystem Size Used Avail Use% Mounted on
/dev/root 78G 21G 58G 27% /
If you make the box bigger, you often add swap or tune services to use the new RAM. For example, raising the worker count in Nginx (a popular web server / reverse proxy) after adding CPUs:
# /etc/nginx/nginx.conf
worker_processes auto; # auto = one worker per CPU core
Then reload it:
sudo nginx -t
sudo systemctl reload nginx
Output:
nginx: configuration file /etc/nginx/nginx.conf test is successful
When to use vertical scaling: early on, for simplicity. It needs no code changes and no extra coordination. It is also the right choice for workloads that genuinely cannot be split, like a single relational database (a database that stores data in tables with strict relationships, e.g. PostgreSQL).
When NOT to use it: there is always a ceiling — you can only buy so big a machine, and the biggest sizes cost a lot per unit of power. It is also a single point of failure: if that one box dies, everything is down.
Horizontal scaling (scale out)
Horizontal scaling means running many copies of your app on several servers and spreading traffic across them. Instead of one giant box, you have, say, five normal boxes. A load balancer (a server that receives all incoming requests and forwards each one to one of your app servers) sits in front and distributes the work.
This is how large systems scale almost without limit — when you need more capacity, you add another server to the pool.
# Spin up another app server, then add it to the Nginx upstream pool
sudo nano /etc/nginx/sites-available/myapp
# /etc/nginx/sites-available/myapp
upstream app_pool {
server 10.0.0.11:3000;
server 10.0.0.12:3000;
server 10.0.0.13:3000; # newly added third server
}
server {
listen 80;
server_name myapp.example.com;
location / {
proxy_pass http://app_pool;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
sudo nginx -t && sudo systemctl reload nginx
When to use horizontal scaling: for stateless web/API tiers that must handle large or spiky traffic, and whenever you need high availability (the system stays up even if a machine fails). Because traffic is shared, losing one server only loses a fraction of capacity, not the whole service.
When NOT to use it: when the workload is hard to split (some databases), or when the extra complexity — load balancing, deployment to many nodes, shared configuration — is not worth it for a small app.
Why statelessness is the key
State is data that “sticks” to one server, like a user’s login session stored in that server’s local memory. If state lives on a single box, horizontal scaling breaks: the load balancer might send your next request to a different server that has never heard of you.
The fix is to make your app stateless — each server keeps no per-user data of its own. Instead, shared state goes into an external store that every server can reach, such as Redis (a fast in-memory data store) for sessions, or a shared database for data.
sudo apt update
sudo apt install -y redis-server
sudo systemctl enable --now redis-server
redis-cli ping
Output:
PONG
With sessions in Redis, any server can serve any request, so you can add or remove servers freely.
Avoid “sticky sessions” (pinning a user to one server) as a long-term fix. They paper over local state, unbalance your load, and lose user data when that one server restarts. Make the app stateless instead.
Autoscaling
Autoscaling means the system adds or removes servers automatically based on demand — for example, “add a server whenever average CPU goes above 70 percent, and remove one when it drops below 30 percent.” This is horizontal scaling driven by metrics rather than by hand. It is the standard way to handle traffic that rises and falls during the day, because you only pay for capacity while you need it.
Autoscaling lives in your cloud platform (AWS Auto Scaling Groups, Kubernetes Horizontal Pod Autoscaler, and similar). It depends on two things you must get right first: a stateless app, and a healthy load balancer that routes traffic only to servers that pass a health check (a small request, like GET /healthz, the platform sends to confirm a server is alive).
Vertical vs horizontal — when to use which
| Aspect | Vertical (scale up) | Horizontal (scale out) |
|---|---|---|
| What you change | Make one server bigger | Add more servers |
| Code changes needed | None | App must be stateless |
| Upper limit | Capped by biggest machine | Practically unlimited |
| Fault tolerance | Single point of failure | One node can die safely |
| Cost shape | Pricey at the top end | Many cheaper boxes |
| Needs a load balancer | No | Yes |
| Autoscaling friendly | No | Yes |
| Best for | Databases, early/simple apps | Web/API tiers, spiky traffic |
In practice most real systems do both: scale the database up to a point, and scale the stateless app tier out behind a load balancer.
Best practices
- Start vertical for simplicity, then move to horizontal before you hit a single machine’s ceiling.
- Make every app server stateless: push sessions to Redis and data to a shared database.
- Always put a load balancer in front of a horizontally scaled tier, with real health checks.
- Base autoscaling on a metric that reflects real load (CPU, request latency, or queue depth), not guesswork.
- Set sensible minimum and maximum server counts so a traffic spike — or a bug — can’t scale you to a huge bill.
- Load-test before you need to: know roughly how many requests per second one server handles.
- Treat the database as the hard part — scaling it usually needs read replicas or sharding, not just more app servers.