AWS Lambda Cold Starts: A Deep Dive

Intermediate 9 min read

What you'll learn

✓The exact lifecycle of a cold start
✓Which runtimes and configurations are slowest
✓How provisioned concurrency and SnapStart work
✓Patterns to keep init out of the hot path
✓How to measure cold starts accurately

Prerequisites

•Some experience deploying Lambda functions

What and Why

A Lambda cold start is the latency a request pays when AWS has to create a fresh execution environment for it. Once warm, that environment handles many requests with near-zero overhead. Cold starts hurt p99 latency on user-facing APIs, especially for VPC-attached functions and heavy runtimes like Java and .NET.

Understanding what happens during init is the difference between paying for provisioned concurrency you do not need and shipping a function that is fast by construction.

Mental Model

A cold start has four phases:

Download: AWS downloads your deployment package (zip or container image) onto a worker.
Init runtime: the language runtime starts (Node, Python, JVM, .NET CLR).
Init handler: your module-level code runs — imports, SDK clients, config loading.
Invoke: your handler function executes against the event.

Phases 1-3 are the cold start. Phase 4 is the warm path. AWS reuses the environment for subsequent invocations, skipping phases 1-3 entirely.

Cold path (first request to new env):
download -> init runtime -> init handler -> invoke
 ~100ms       ~50-500ms        your code       your code

Warm path (next requests, same env):
invoke
your code only

Lambda invocation lifecycle

Hands-on Example

A handler that is slow on every cold start because the SDK client is created per invocation:

import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';

export const handler = async (event) => {
  const s3 = new S3Client({}); // BAD: rebuilt every warm invoke too
  const data = await s3.send(new GetObjectCommand({
    Bucket: 'reports', Key: event.key
  }));
  return { ok: true };
};

The cold-start-friendly version hoists init to module scope so it runs once per environment:

import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';

const s3 = new S3Client({}); // one-time per env, reused on warm invokes

export const handler = async (event) => {
  const data = await s3.send(new GetObjectCommand({
    Bucket: 'reports', Key: event.key
  }));
  return { ok: true };
};

For Java functions with strict latency SLOs, enable SnapStart in your deployment:

Resources:
  Api:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: java21
      MemorySize: 1024
      SnapStart:
        ApplyOn: PublishedVersions

SnapStart snapshots the initialized JVM and restores it on cold start, typically cutting Java cold starts from 2-6 seconds down to a few hundred milliseconds.

For functions that must never cold start, provision concurrency:

aws lambda put-provisioned-concurrency-config \
  --function-name api \
  --qualifier prod \
  --provisioned-concurrent-executions 10

This keeps 10 environments warm at all times. You pay for them whether they handle traffic or not.

Common Pitfalls

Heavy import graphs. A Node.js function importing the full aws-sdk v2 costs roughly 400 ms of init. Switch to modular v3 clients and tree-shake.

VPC-attached Lambdas pre-2019 myth. Hyperplane ENIs solved most VPC cold start pain. But the function still must reach AWS services either through VPC endpoints or a NAT — a misconfigured route adds seconds of timeout.

Reading config from SSM/Secrets Manager per invocation. Cache it at module scope, refresh on a TTL.

Big container images. Lambda supports up to 10 GB images, but bigger means slower download. Aim for under 250 MB by stripping build deps with a multi-stage Dockerfile.

Measuring with the console “Test” button. That always runs cold. Use real traffic plus CloudWatch’s Init Duration metric.

Practical Tips

Increase memory to reduce CPU-bound init time. Lambda allocates CPU proportionally to memory; 1769 MB gives you one full vCPU. Many Node and Python init times halve when you go from 512 MB to 1024 MB, and the per-millisecond cost increase is mostly offset by faster execution.

Use arm64 (Graviton) runtimes. They are 20 percent cheaper and often a touch faster on init.

For Python, prefer boto3.client at module scope and avoid heavy ML imports unless you need them. Lazy-import inside handlers for rarely-used code paths:

def handler(event, context):
    if event.get('needsImage'):
        from PIL import Image  # only paid when needed
        ...

For Java, combine SnapStart with priming — invoke key code paths during init so classes are loaded:

static {
  // prime expensive class loading
  new ObjectMapper().writeValueAsString(Map.of("k","v"));
}

Use provisioned concurrency only for latency-critical endpoints, and pair it with auto-scaling so you do not pay for peak capacity all night.

Measure with the right percentile. Cold starts are tail latency, so look at p99 of Duration + InitDuration, not the mean.

Wrap-up

A cold start is download, runtime init, handler init, then your code. Everything you do at module scope happens once per environment; everything you do inside the handler happens every invocation. Shrink the import graph, hoist clients to module scope, lean on SnapStart for Java, and pull in provisioned concurrency only when a user-facing SLO demands it. Once you can sketch the four phases on a whiteboard, every Lambda performance question becomes a question of which phase you are paying for.