Load Balancing with Nginx

When one copy of your app can’t handle all the traffic, you run several copies and split requests between them. That splitting is called load balancing (sharing incoming requests across many backend servers so no single one gets overwhelmed). Nginx is one of the most popular tools for this job: it sits in front of your app instances, takes every request, and hands each one to a healthy backend. This page shows you how to set that up on Ubuntu, the different ways Nginx can pick a backend, and how it skips servers that are down.

Why load balancing matters

A single app process has limits: a fixed amount of CPU and memory. Once it’s busy, new requests wait in line and your site feels slow. There are two ways to grow:

Vertical scaling — make one server bigger (more CPU/RAM). Simple, but there’s a ceiling, and a reboot takes your whole site offline.
Horizontal scaling — run more servers and spread traffic across them. No single point of failure, and you can add capacity on demand.

Load balancing is what makes horizontal scaling work. Nginx becomes the single front door (a reverse proxy — a server that sits in front of your app and forwards requests to it), while your real app runs as several identical instances behind it.

When to use this: any time you have more than one app instance, or you expect traffic that one box can’t handle, or you want zero-downtime deploys (take one instance out, update it, put it back). When NOT to: a small low-traffic site running a single app process — a plain reverse proxy is enough, and load balancing just adds moving parts.

The upstream block

In Nginx, you list your backend servers inside an upstream block and give that group a name. Then you point proxy_pass at the name instead of a single address.

Assume you run three copies of a Node.js app on ports 3001, 3002, and 3003 on the same machine (in production these are often separate servers — just use their IP addresses instead).

Create a site config:

sudo nano /etc/nginx/sites-available/myapp

# Define the pool of backend app instances
upstream myapp_backend {
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
}

server {
    listen 80;
    server_name myapp.example.com;

    location / {
        proxy_pass http://myapp_backend;

        # Forward useful info to the backend
        proxy_set_header Host              $host;
        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Enable the site, test the config, and reload Nginx:

sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

Output:

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

That’s it. With no other settings, Nginx now spreads requests across the three instances using round robin (the default — see below).

Always run sudo nginx -t before reload. A typo in a config that’s already live won’t crash a running Nginx, but reloading a broken config will fail and leave you wondering why your changes didn’t take effect.

Load-balancing methods

Nginx offers several strategies for choosing which backend gets the next request. You pick one by adding a directive at the top of the upstream block.

Method	Directive	How it picks a backend	When to use
Round robin	(default, none)	Each backend in turn, evenly	Stateless apps where any instance can serve any request
Least connections	`least_conn;`	The backend with the fewest active connections	Requests vary a lot in duration (some slow, some fast)
IP hash	`ip_hash;`	Always the same backend for a given client IP	When a user must stick to one instance (in-memory sessions)
Weighted	`server ... weight=N;`	Bigger servers get proportionally more traffic	A mix of strong and weak machines

Least connections

upstream myapp_backend {
    least_conn;
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
}

IP hash (sticky sessions)

If your app keeps login state in memory rather than in a shared store (like Redis), a user could log in on instance 1 and then get sent to instance 2, which doesn’t know them. ip_hash pins each client IP to one backend so that doesn’t happen.

upstream myapp_backend {
    ip_hash;
    server 127.0.0.1:3001;
    server 127.0.0.1:3002;
    server 127.0.0.1:3003;
}

Sticky sessions are a band-aid. The cleaner fix is to make your app stateless — store sessions in a shared database or Redis — so any instance can serve any user and you can scale freely.

Weights

upstream myapp_backend {
    server 127.0.0.1:3001 weight=3;  # gets 3x the traffic
    server 127.0.0.1:3002 weight=1;
    server 127.0.0.1:3003 weight=1;
}

Health checks and failover

The open-source version of Nginx does passive health checks: it watches real traffic and, if a backend fails to respond, it marks that server as down for a while and stops sending it requests. You tune this per server:

upstream myapp_backend {
    server 127.0.0.1:3001 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:3002 max_fails=3 fail_timeout=30s;
    server 127.0.0.1:3003 backup;   # only used if the others are down
}

max_fails=3 — after 3 failed attempts…
fail_timeout=30s — …mark the server unavailable for 30 seconds, then try it again.
backup — this server receives traffic only when every non-backup server is down.

So if instance 3001 crashes, Nginx notices, routes around it, and your users keep browsing. When it comes back, Nginx starts using it again automatically. (True active health checks, where Nginx probes a /health URL on a schedule, are a paid Nginx Plus feature; the open-source passive checks above are enough for most sites.)

You can confirm failover by stopping one instance and watching the access log keep flowing:

sudo tail -f /var/log/nginx/access.log

Best practices

Keep your app stateless (sessions in Redis or a database) so you can scale horizontally without ip_hash tricks.
Start with the default round robin; switch to least_conn only if request durations vary a lot.
Always set max_fails and fail_timeout so a dead backend is taken out of rotation quickly.
Forward X-Forwarded-For and X-Forwarded-Proto headers so your app sees the real client IP and protocol.
Run an odd number of instances behind one balancer, and consider two Nginx nodes (with a floating IP) so the balancer itself isn’t a single point of failure.
Test every change with sudo nginx -t before sudo systemctl reload nginx.
Open only the public port (sudo ufw allow 80/tcp / 443/tcp) and keep backend ports bound to 127.0.0.1 or a private network.