Skip to content
AWS aws serverless 6 min read

Cold Starts & How to Reduce Them

A cold start is the extra delay you see the first time a Lambda function runs after being idle (or under a sudden traffic spike). AWS Lambda is serverless, meaning you don’t manage servers, AWS spins up a tiny environment on demand to run your code. When no environment is ready, Lambda has to create one before your code runs, and that setup time is the cold start. For most background jobs this is invisible, but for user-facing APIs an unexpected one-second pause can feel broken. This page explains exactly what causes cold starts and the practical levers you have to cut them.

What actually happens during a cold start

When a request arrives and no warm environment is available, Lambda goes through a sequence before your handler runs:

  1. Download your code — Lambda fetches your deployment package (a ZIP file or a container image) onto a fresh micro-VM (a tiny, isolated virtual machine called Firecracker).
  2. Start the runtime — it boots the language runtime (the engine that runs your code), for example Node.js, Python, Java (the JVM, or Java Virtual Machine), or .NET.
  3. Run your init code — anything outside your handler function runs once: importing libraries, opening database connections, reading configuration.
  4. Invoke your handler — only now does your actual request logic run.

Steps 1-3 are the cold start. Once the environment exists, Lambda keeps it warm for a while and reuses it, so the next requests skip straight to step 4. Those are “warm starts” and are typically single-digit milliseconds.

Tip: Code you put outside the handler runs once per environment, not once per request. Open database clients and load SDKs there so warm invocations reuse them. But heavy init also makes the cold start longer, so keep it lean.

What makes cold starts worse

FactorWhy it hurtsWhat to do
Large deployment packageMore bytes to download and unpackTrim dependencies, use Lambda Layers
Heavy initializationMore work before the handler runsLazy-load (only load a library when first used)
JVM / .NET runtimesThese runtimes are slow to boot vs Node/PythonUse SnapStart
Container images vs ZIPLarger images take longer to pullKeep images small; use ZIP if you can
VPC attachment (historically)Old Lambdas waited seconds to create network cardsNo longer an issue, see below

The old VPC penalty is gone

A Virtual Private Cloud (VPC) is your own private network inside AWS. Years ago, attaching a Lambda to a VPC added a multi-second cold start because Lambda had to create an Elastic Network Interface (ENI, a virtual network card) per environment. As of the Hyperplane networking change (rolled out 2019), ENIs are created and shared ahead of time. Today a VPC-attached Lambda has essentially the same cold start as one without a VPC. Do not avoid VPCs out of fear of cold starts anymore.

Mitigation 1: slimmer packages and lazy loading

The cheapest fix costs nothing. Smaller code downloads faster, and deferring work shortens init.

# Bad: imports the whole boto3 SDK at module load (slow init)
import boto3
s3 = boto3.client("s3")

def handler(event, context):
    return s3.list_buckets()
# Better: lazy-load only what you use, and cache the client
import boto3
_s3 = None

def handler(event, context):
    global _s3
    if _s3 is None:
        _s3 = boto3.client("s3")  # created once, reused on warm starts
    return _s3.list_buckets()

When to use this: always. It is free and helps every function. Move rarely used heavy imports inside the code path that needs them.

Mitigation 2: provisioned concurrency

Provisioned concurrency keeps a set number of environments fully initialized and warm at all times, so requests hitting them never see a cold start.

Console steps:

  1. Open the Lambda console and select your function.
  2. Go to the Configuration tab, then Concurrency.
  3. Under Provisioned concurrency, choose Add configuration.
  4. Pick a version or alias (provisioned concurrency cannot target $LATEST directly).
  5. Set the number of environments to keep warm (for example 5) and save.

AWS CLI:

aws lambda put-provisioned-concurrency-config \
  --function-name checkout-api \
  --qualifier prod \
  --provisioned-concurrent-executions 5

Output:

{
    "RequestedProvisionedConcurrentExecutions": 5,
    "AvailableProvisionedConcurrentExecutions": 0,
    "AllocatedProvisionedConcurrentExecutions": 0,
    "Status": "IN_PROGRESS"
}

Cost gotcha: Provisioned concurrency removes cold starts but you pay for those environments 24/7, like an always-on instance, even at 3am with zero traffic. In us-east-1, keeping 5 environments of 1 GB warm costs roughly $60/month before you add the normal per-request charges. Reserve it for latency-critical, user-facing paths, not batch jobs.

When to use this: predictable, latency-sensitive traffic (a login API, a checkout flow). When not to: spiky or low-traffic workloads where the always-on bill outweighs the benefit. Pair it with Application Auto Scaling to raise the count during business hours and drop it overnight.

Mitigation 3: SnapStart

SnapStart is built for slow-booting runtimes like Java and .NET. Lambda runs your initialization once, takes a snapshot (a saved memory image) of the ready environment, and restores from that snapshot for future cold starts instead of booting from scratch. This can cut Java cold starts from seconds to a few hundred milliseconds, and unlike provisioned concurrency, SnapStart has no extra hourly charge.

Console steps:

  1. Open your function and go to Configuration, then General configuration, then Edit.
  2. Find SnapStart and set it to Published versions.
  3. Save, then publish a new version, the snapshot is created at publish time.

AWS CLI:

aws lambda update-function-configuration \
  --function-name reporting-service \
  --snap-start ApplyOn=PublishedVersions

Output:

{
    "FunctionName": "reporting-service",
    "Runtime": "java21",
    "SnapStart": {
        "ApplyOn": "PublishedVersions",
        "OptimizationStatus": "Off"
    }
}

When to use this: Java, Python, and .NET functions with meaningful init time. Watch out for: snapshot uniqueness, anything generated once during init (random seeds, cached timestamps) is frozen into the snapshot and reused. Regenerate such values inside the handler or via a runtime hook.

Provisioned concurrency vs SnapStart vs slim package

OptionCold startsExtra costBest for
Slim package / lazy loadReducedNoneEverything, baseline hygiene
SnapStartGreatly reducedNoneJava/.NET/Python with heavy init
Provisioned concurrencyEliminatedPays 24/7 per environmentLatency-critical, predictable traffic

Best practices

  • Start with the free wins: trim dependencies, move heavy imports out of the global path, and reuse clients across invocations.
  • Measure before optimizing, use the Init Duration field in CloudWatch logs to see your real cold-start cost.
  • Reach for SnapStart on Java and .NET before paying for provisioned concurrency.
  • Use provisioned concurrency only on user-facing, latency-sensitive functions, and scale it down off-hours to control the bill.
  • Don’t avoid VPCs to dodge cold starts, the old multi-second ENI penalty no longer exists.
  • Prefer ZIP packages over container images when you can, and keep images minimal when you can’t.
Last updated June 15, 2026
Was this helpful?