The Well-Architected Framework
The AWS Well-Architected Framework is a set of questions and best practices AWS publishes to help you judge whether a cloud design is sound. It is not a product you deploy. It is a structured way of thinking, organised into six “pillars”, that surfaces the tradeoffs hiding inside your architecture before they turn into a 3 a.m. outage or a surprise bill. The most important idea on this whole page: the framework is about making tradeoffs on purpose, not about scoring full marks on every pillar.
Why a framework at all
When you build on AWS, every choice quietly trades one good thing for another. A single small server is cheap but fragile. Spreading copies across three data centres is reliable but costs more. Encrypting and logging everything is secure but slower to operate. Left implicit, these tradeoffs get made by accident. The Well-Architected Framework gives you a shared checklist so a team can say out loud, “we chose lower cost here, and we accept that this service can be down for an hour.”
The single biggest misuse of this framework is treating it like an exam to ace. Optimising cost can hurt reliability; optimising reliability raises cost. A “perfect” architecture in all six pillars at once usually does not exist. Use the framework to find the tradeoffs you are already making without realising it.
The six pillars
| Pillar | Plain-English question it asks | What it pushes you toward | What it can cost you |
|---|---|---|---|
| Operational excellence | Can we run, monitor and change this safely? | Automation, runbooks, observability | Time spent building tooling |
| Security | Who can touch what, and is data protected? | Least-privilege access, encryption, audit logs | Some speed and convenience |
| Reliability | Does it recover from failure on its own? | Redundancy, backups, health checks | Higher infrastructure cost |
| Performance efficiency | Are we using the right resources, sized right? | Modern instance types, serverless, caching | Engineering effort to right-size |
| Cost optimization | Are we paying only for what we need? | Smaller resources, Savings Plans, shutdowns | Reduced headroom and redundancy |
| Sustainability | Are we minimising energy and waste? | Efficient regions, less idle capacity | Sometimes conflicts with raw performance |
A short tour of each:
Operational excellence
This pillar is about running the system day to day. It asks whether you deploy with automation (not manual clicking), whether you can see what is happening through metrics and logs, and whether you learn from failures. Think Infrastructure as Code (defining your servers in a file instead of by hand) and dashboards in Amazon CloudWatch (AWS’s monitoring service).
Security
This pillar covers identity, data protection and detection. The core practice is least privilege: every user and service gets the minimum permissions it needs, granted through AWS Identity and Access Management (IAM, the service that controls who can do what). It also covers encrypting data and keeping audit trails with AWS CloudTrail (a service that records every API call in your account).
Reliability
This pillar is about recovering from failure and meeting demand. It pushes you to run across multiple Availability Zones (isolated data centres within a region), to take backups, and to use health checks so traffic routes away from broken instances automatically.
Performance efficiency
This pillar asks whether you picked the right tool and sized it correctly. Are you on a current-generation instance type? Could a managed or serverless service do the job with less waste? Are you caching repeated work?
Cost optimization
This pillar asks whether you are paying only for value received. It covers turning off idle resources, choosing Savings Plans (a discount for committing to steady usage), and matching capacity to real demand.
Sustainability
Added in 2021, this pillar asks you to reduce the energy and hardware your workload consumes. Practices include choosing efficient regions, removing idle resources, and using managed services that pack many customers onto shared hardware.
How the pillars pull against each other
A concrete example. Suppose a small API runs on one t3.medium instance (i-0a1b2c3d4e5f) in a single subnet (subnet-0a1b2c3d).
- Cost optimization is happy: one instance is about $30/month on demand.
- Reliability is unhappy: if that Availability Zone fails, the whole API is down.
To satisfy reliability you add a second instance in another zone behind a load balancer. Now reliability improves, but your compute cost roughly doubles to around $60/month, plus about $16/month for the load balancer. There is no free win here — you traded money for resilience. The framework’s job is to make that trade visible and deliberate, not to tell you redundancy is always “correct”.
The Well-Architected Tool
AWS provides a free service called the AWS Well-Architected Tool that walks you through the pillar questions and records your answers as a “workload review”. It flags risks as High Risk Issues (HRIs) or Medium Risk Issues and lets you track improvements over time.
When to use it: before a major launch, after an incident, or on a recurring schedule (for example quarterly) for important workloads. When not to bother: for a throwaway prototype or a personal side project — the overhead outweighs the benefit.
Console steps
- Open the AWS Management Console and go to the AWS Well-Architected Tool.
- Choose Define workload, give it a name, environment (Production or Pre-production), and the regions it runs in.
- Select the AWS Well-Architected Framework lens (a “lens” is a question set; there are also specialised lenses like Serverless).
- Choose Start reviewing and answer the questions pillar by pillar. Mark items as “None of these” honestly where they do not apply.
- Open the Improvement plan tab to see prioritised High and Medium Risk Issues with links to guidance.
CLI equivalent
aws wellarchitected create-workload \
--workload-name "checkout-api-prod" \
--description "Customer checkout API" \
--environment PRODUCTION \
--aws-regions us-east-1 \
--lenses wellarchitected \
--review-owner "[email protected]"
Output:
{
"WorkloadId": "a1b2c3d4e5f60718293a4b5c6d7e8f90",
"WorkloadArn": "arn:aws:wellarchitected:us-east-1:123456789012:workload/a1b2c3d4e5f60718293a4b5c6d7e8f90"
}
List the risks the review found:
aws wellarchitected list-lens-review-improvements \
--workload-id a1b2c3d4e5f60718293a4b5c6d7e8f90 \
--lens-alias wellarchitected
Output:
{
"ImprovementSummaries": [
{
"QuestionId": "reliability-fault-isolation",
"PillarId": "reliability",
"Risk": "HIGH",
"ImprovementPlanUrl": "https://docs.aws.amazon.com/wellarchitected/..."
}
]
}
Cost note: the Well-Architected Tool itself is free. The cost lands later, when you act on its recommendations (for example adding redundancy). Review the improvement plan with a budget in mind and reject items whose cost is not justified for that workload.
Best practices
- Treat the framework as a checklist for surfacing tradeoffs, not a scorecard to max out.
- Right-size the rigour: a payments system deserves a full six-pillar review; a demo does not.
- Record why you accepted each risk, so future engineers understand the deliberate choices.
- Re-review after incidents and major changes — architectures drift away from “well-architected” over time.
- When two pillars conflict, decide explicitly which one wins for this workload and write it down.
- Pair the framework with cost data; never accept a reliability recommendation without checking its monthly price.
- Use specialised lenses (Serverless, SaaS) on top of the base lens when they match your workload.