What is Load Balancing?

Load balancing means spreading incoming network traffic across several servers instead of sending it all to one. A piece of software called a load balancer sits in front of your servers, receives every request, and hands each one to a healthy server that can answer it. This is the foundation of running web applications that stay online and grow with demand, so it is one of the first cloud building blocks worth understanding well.

What problem does it solve?

Imagine a single web server running your app. If that server gets too busy, requests slow down or fail. If it crashes, your whole site goes down. You have two hard limits: one machine can only handle so much traffic, and one machine is a single point of failure (one thing that, if it breaks, takes everything down with it).

A load balancer fixes both problems at once:

Availability. It constantly checks the health of each server. If one stops responding, the load balancer stops sending traffic to it and routes everyone to the servers that are still working. Users never notice.
Scale. When traffic grows, you add more servers behind the load balancer. The traffic is split across all of them, so you handle more users without rewriting your app. This is called horizontal scaling (adding more machines), as opposed to vertical scaling (making one machine bigger).

A simple mental model

                 ┌─────────────────┐
   Users ───────▶│  Load Balancer  │
                 └────────┬────────┘
            ┌─────────────┼─────────────┐
            ▼             ▼             ▼
       ┌─────────┐   ┌─────────┐   ┌─────────┐
       │ Server  │   │ Server  │   │ Server  │
       │ (AZ-a)  │   │ (AZ-b)  │   │ (AZ-c)  │
       └─────────┘   └─────────┘   └─────────┘

Each server lives in a different Availability Zone (an AZ — a separate, isolated data center within an AWS Region). Spreading servers across AZs means that even if an entire data center has an outage, your app keeps serving traffic from the others.

ELB: the managed AWS option

You could run your own load balancer software (such as NGINX or HAProxy) on an EC2 instance, but then that server becomes a thing you have to scale, patch, and keep alive. AWS solves this with Elastic Load Balancing (ELB) — a fully managed service that runs the load balancer for you across multiple AZs, scales automatically with traffic, and never becomes a single point of failure.

ELB comes in a few flavours for different jobs:

Load balancer type	Works at	Best for
Application Load Balancer (ALB)	HTTP/HTTPS (Layer 7)	Web apps, APIs, routing by URL path or hostname
Network Load Balancer (NLB)	TCP/UDP (Layer 4)	Extreme performance, static IPs, non-HTTP protocols
Gateway Load Balancer (GWLB)	Network packets (Layer 3)	Running third-party security appliances (firewalls)

For most web applications, the ALB is the right default. “Layer 7” means it understands HTTP, so it can make smart routing decisions based on the request itself (for example, send /api/* to one group of servers and /images/* to another).

Cost note: An ALB has a small hourly charge (about $0.0225 per hour, roughly $16 per month) plus a usage charge based on traffic measured in LCUs (Load Balancer Capacity Units). For a typical small app this lands around $18-25 per month. An idle load balancer still costs the hourly fee, so do not leave forgotten ones running.

The gotcha: a load balancer alone changes nothing

This is the mistake almost every beginner makes. A load balancer only adds value if there is more than one healthy target behind it, and those targets span multiple Availability Zones.

If you put a single EC2 instance behind a load balancer:

You still have a single point of failure — if that one instance dies, the load balancer has nowhere to send traffic and your app is down anyway.
You have not gained any extra capacity, because there is still only one server doing the work.
You are now paying the ELB hourly fee plus data charges for zero benefit.

The load balancer is only half the pattern. The other half is running multiple instances across at least two AZs (and ideally letting an Auto Scaling group manage that fleet automatically). One without the other is wasted money.

Tip: When you create an ALB, AWS asks you to select at least two subnets in two different AZs. That requirement is a hint: the design only makes sense with redundancy on the backend too.

When to use load balancing

Use a load balancer when:

You run two or more instances of an app and need traffic split across them.
You need high availability — the app must survive an instance or AZ failure.
You want to scale out automatically as demand changes.
You want one stable address (a DNS name) in front of a changing set of servers.
You need to terminate HTTPS in one place or route by URL path/host.

You probably do not need one when:

You run a single small instance for a hobby project or internal tool with no uptime requirement — point DNS straight at the instance and save the cost.
Your workload is a batch job or queue worker that nothing connects to directly.

How clients reach the load balancer

A real difference from a raw server: you never give users an instance’s IP address. AWS gives the ELB a stable DNS name (something like my-app-alb-123456789.us-east-1.elb.amazonaws.com). You point your own domain at it with a Route 53 alias record, and the load balancer hides all the churn behind the scenes — instances can come and go and your domain never changes.

You can confirm a deployed load balancer’s DNS name from the AWS CLI:

aws elbv2 describe-load-balancers \
  --names my-app-alb \
  --query "LoadBalancers[0].DNSName" \
  --output text

Output:

my-app-alb-123456789.us-east-1.elb.amazonaws.com

You would then test that the load balancer is answering:

curl -I http://my-app-alb-123456789.us-east-1.elb.amazonaws.com