Cold Starts & How to Reduce Them

A cold start is the extra delay you see the first time a Lambda function runs after being idle (or under a sudden traffic spike). AWS Lambda is serverless, meaning you don’t manage servers, AWS spins up a tiny environment on demand to run your code. When no environment is ready, Lambda has to create one before your code runs, and that setup time is the cold start. For most background jobs this is invisible, but for user-facing APIs an unexpected one-second pause can feel broken. This page explains exactly what causes cold starts and the practical levers you have to cut them.

What actually happens during a cold start

When a request arrives and no warm environment is available, Lambda goes through a sequence before your handler runs:

Download your code — Lambda fetches your deployment package (a ZIP file or a container image) onto a fresh micro-VM (a tiny, isolated virtual machine called Firecracker).
Start the runtime — it boots the language runtime (the engine that runs your code), for example Node.js, Python, Java (the JVM, or Java Virtual Machine), or .NET.
Run your init code — anything outside your handler function runs once: importing libraries, opening database connections, reading configuration.
Invoke your handler — only now does your actual request logic run.

Steps 1-3 are the cold start. Once the environment exists, Lambda keeps it warm for a while and reuses it, so the next requests skip straight to step 4. Those are “warm starts” and are typically single-digit milliseconds.

Tip: Code you put outside the handler runs once per environment, not once per request. Open database clients and load SDKs there so warm invocations reuse them. But heavy init also makes the cold start longer, so keep it lean.

What makes cold starts worse

Factor	Why it hurts	What to do
Large deployment package	More bytes to download and unpack	Trim dependencies, use Lambda Layers
Heavy initialization	More work before the handler runs	Lazy-load (only load a library when first used)
JVM / .NET runtimes	These runtimes are slow to boot vs Node/Python	Use SnapStart
Container images vs ZIP	Larger images take longer to pull	Keep images small; use ZIP if you can
VPC attachment (historically)	Old Lambdas waited seconds to create network cards	No longer an issue, see below

The old VPC penalty is gone

A Virtual Private Cloud (VPC) is your own private network inside AWS. Years ago, attaching a Lambda to a VPC added a multi-second cold start because Lambda had to create an Elastic Network Interface (ENI, a virtual network card) per environment. As of the Hyperplane networking change (rolled out 2019), ENIs are created and shared ahead of time. Today a VPC-attached Lambda has essentially the same cold start as one without a VPC. Do not avoid VPCs out of fear of cold starts anymore.

Mitigation 1: slimmer packages and lazy loading

The cheapest fix costs nothing. Smaller code downloads faster, and deferring work shortens init.

# Bad: imports the whole boto3 SDK at module load (slow init)
import boto3
s3 = boto3.client("s3")

def handler(event, context):
    return s3.list_buckets()

# Better: lazy-load only what you use, and cache the client
import boto3
_s3 = None

def handler(event, context):
    global _s3
    if _s3 is None:
        _s3 = boto3.client("s3")  # created once, reused on warm starts
    return _s3.list_buckets()

When to use this: always. It is free and helps every function. Move rarely used heavy imports inside the code path that needs them.

Mitigation 2: provisioned concurrency

Provisioned concurrency keeps a set number of environments fully initialized and warm at all times, so requests hitting them never see a cold start.

Console steps:

Open the Lambda console and select your function.
Go to the Configuration tab, then Concurrency.
Under Provisioned concurrency, choose Add configuration.
Pick a version or alias (provisioned concurrency cannot target $LATEST directly).
Set the number of environments to keep warm (for example 5) and save.

AWS CLI:

aws lambda put-provisioned-concurrency-config \
  --function-name checkout-api \
  --qualifier prod \
  --provisioned-concurrent-executions 5

Output:

{
    "RequestedProvisionedConcurrentExecutions": 5,
    "AvailableProvisionedConcurrentExecutions": 0,
    "AllocatedProvisionedConcurrentExecutions": 0,
    "Status": "IN_PROGRESS"
}

Cost gotcha: Provisioned concurrency removes cold starts but you pay for those environments 24/7, like an always-on instance, even at 3am with zero traffic. In us-east-1, keeping 5 environments of 1 GB warm costs roughly $60/month before you add the normal per-request charges. Reserve it for latency-critical, user-facing paths, not batch jobs.

When to use this: predictable, latency-sensitive traffic (a login API, a checkout flow). When not to: spiky or low-traffic workloads where the always-on bill outweighs the benefit. Pair it with Application Auto Scaling to raise the count during business hours and drop it overnight.

Mitigation 3: SnapStart

SnapStart is built for slow-booting runtimes like Java and .NET. Lambda runs your initialization once, takes a snapshot (a saved memory image) of the ready environment, and restores from that snapshot for future cold starts instead of booting from scratch. This can cut Java cold starts from seconds to a few hundred milliseconds, and unlike provisioned concurrency, SnapStart has no extra hourly charge.