Cold Starts & How to Reduce Them
A cold start is the extra delay you see the first time a Lambda function runs after being idle (or under a sudden traffic spike). AWS Lambda is serverless, meaning you don’t manage servers, AWS spins up a tiny environment on demand to run your code. When no environment is ready, Lambda has to create one before your code runs, and that setup time is the cold start. For most background jobs this is invisible, but for user-facing APIs an unexpected one-second pause can feel broken. This page explains exactly what causes cold starts and the practical levers you have to cut them.
What actually happens during a cold start
When a request arrives and no warm environment is available, Lambda goes through a sequence before your handler runs:
- Download your code — Lambda fetches your deployment package (a ZIP file or a container image) onto a fresh micro-VM (a tiny, isolated virtual machine called Firecracker).
- Start the runtime — it boots the language runtime (the engine that runs your code), for example Node.js, Python, Java (the JVM, or Java Virtual Machine), or .NET.
- Run your init code — anything outside your handler function runs once: importing libraries, opening database connections, reading configuration.
- Invoke your handler — only now does your actual request logic run.
Steps 1-3 are the cold start. Once the environment exists, Lambda keeps it warm for a while and reuses it, so the next requests skip straight to step 4. Those are “warm starts” and are typically single-digit milliseconds.
Tip: Code you put outside the handler runs once per environment, not once per request. Open database clients and load SDKs there so warm invocations reuse them. But heavy init also makes the cold start longer, so keep it lean.
What makes cold starts worse
| Factor | Why it hurts | What to do |
|---|---|---|
| Large deployment package | More bytes to download and unpack | Trim dependencies, use Lambda Layers |
| Heavy initialization | More work before the handler runs | Lazy-load (only load a library when first used) |
| JVM / .NET runtimes | These runtimes are slow to boot vs Node/Python | Use SnapStart |
| Container images vs ZIP | Larger images take longer to pull | Keep images small; use ZIP if you can |
| VPC attachment (historically) | Old Lambdas waited seconds to create network cards | No longer an issue, see below |
The old VPC penalty is gone
A Virtual Private Cloud (VPC) is your own private network inside AWS. Years ago, attaching a Lambda to a VPC added a multi-second cold start because Lambda had to create an Elastic Network Interface (ENI, a virtual network card) per environment. As of the Hyperplane networking change (rolled out 2019), ENIs are created and shared ahead of time. Today a VPC-attached Lambda has essentially the same cold start as one without a VPC. Do not avoid VPCs out of fear of cold starts anymore.
Mitigation 1: slimmer packages and lazy loading
The cheapest fix costs nothing. Smaller code downloads faster, and deferring work shortens init.
# Bad: imports the whole boto3 SDK at module load (slow init)
import boto3
s3 = boto3.client("s3")
def handler(event, context):
return s3.list_buckets()
# Better: lazy-load only what you use, and cache the client
import boto3
_s3 = None
def handler(event, context):
global _s3
if _s3 is None:
_s3 = boto3.client("s3") # created once, reused on warm starts
return _s3.list_buckets()
When to use this: always. It is free and helps every function. Move rarely used heavy imports inside the code path that needs them.
Mitigation 2: provisioned concurrency
Provisioned concurrency keeps a set number of environments fully initialized and warm at all times, so requests hitting them never see a cold start.
Console steps:
- Open the Lambda console and select your function.
- Go to the Configuration tab, then Concurrency.
- Under Provisioned concurrency, choose Add configuration.
- Pick a version or alias (provisioned concurrency cannot target
$LATESTdirectly). - Set the number of environments to keep warm (for example
5) and save.
AWS CLI:
aws lambda put-provisioned-concurrency-config \
--function-name checkout-api \
--qualifier prod \
--provisioned-concurrent-executions 5
Output:
{
"RequestedProvisionedConcurrentExecutions": 5,
"AvailableProvisionedConcurrentExecutions": 0,
"AllocatedProvisionedConcurrentExecutions": 0,
"Status": "IN_PROGRESS"
}
Cost gotcha: Provisioned concurrency removes cold starts but you pay for those environments 24/7, like an always-on instance, even at 3am with zero traffic. In us-east-1, keeping 5 environments of 1 GB warm costs roughly $60/month before you add the normal per-request charges. Reserve it for latency-critical, user-facing paths, not batch jobs.
When to use this: predictable, latency-sensitive traffic (a login API, a checkout flow). When not to: spiky or low-traffic workloads where the always-on bill outweighs the benefit. Pair it with Application Auto Scaling to raise the count during business hours and drop it overnight.
Mitigation 3: SnapStart
SnapStart is built for slow-booting runtimes like Java and .NET. Lambda runs your initialization once, takes a snapshot (a saved memory image) of the ready environment, and restores from that snapshot for future cold starts instead of booting from scratch. This can cut Java cold starts from seconds to a few hundred milliseconds, and unlike provisioned concurrency, SnapStart has no extra hourly charge.
Console steps:
- Open your function and go to Configuration, then General configuration, then Edit.
- Find SnapStart and set it to Published versions.
- Save, then publish a new version, the snapshot is created at publish time.
AWS CLI:
aws lambda update-function-configuration \
--function-name reporting-service \
--snap-start ApplyOn=PublishedVersions
Output:
{
"FunctionName": "reporting-service",
"Runtime": "java21",
"SnapStart": {
"ApplyOn": "PublishedVersions",
"OptimizationStatus": "Off"
}
}
When to use this: Java, Python, and .NET functions with meaningful init time. Watch out for: snapshot uniqueness, anything generated once during init (random seeds, cached timestamps) is frozen into the snapshot and reused. Regenerate such values inside the handler or via a runtime hook.
Provisioned concurrency vs SnapStart vs slim package
| Option | Cold starts | Extra cost | Best for |
|---|---|---|---|
| Slim package / lazy load | Reduced | None | Everything, baseline hygiene |
| SnapStart | Greatly reduced | None | Java/.NET/Python with heavy init |
| Provisioned concurrency | Eliminated | Pays 24/7 per environment | Latency-critical, predictable traffic |
Best practices
- Start with the free wins: trim dependencies, move heavy imports out of the global path, and reuse clients across invocations.
- Measure before optimizing, use the
Init Durationfield in CloudWatch logs to see your real cold-start cost. - Reach for SnapStart on Java and .NET before paying for provisioned concurrency.
- Use provisioned concurrency only on user-facing, latency-sensitive functions, and scale it down off-hours to control the bill.
- Don’t avoid VPCs to dodge cold starts, the old multi-second ENI penalty no longer exists.
- Prefer ZIP packages over container images when you can, and keep images minimal when you can’t.