AWS Lambda Cold Starts: A Deep Dive
What actually happens during a Lambda cold start, why some functions are worse than others, and the techniques that meaningfully reduce p99 latency in production.
What you'll learn
- ✓The exact lifecycle of a cold start
- ✓Which runtimes and configurations are slowest
- ✓How provisioned concurrency and SnapStart work
- ✓Patterns to keep init out of the hot path
- ✓How to measure cold starts accurately
Prerequisites
- •Some experience deploying Lambda functions
What and Why
A Lambda cold start is the latency a request pays when AWS has to create a fresh execution environment for it. Once warm, that environment handles many requests with near-zero overhead. Cold starts hurt p99 latency on user-facing APIs, especially for VPC-attached functions and heavy runtimes like Java and .NET.
Understanding what happens during init is the difference between paying for provisioned concurrency you do not need and shipping a function that is fast by construction.
Mental Model
A cold start has four phases:
- Download: AWS downloads your deployment package (zip or container image) onto a worker.
- Init runtime: the language runtime starts (Node, Python, JVM, .NET CLR).
- Init handler: your module-level code runs — imports, SDK clients, config loading.
- Invoke: your handler function executes against the event.
Phases 1-3 are the cold start. Phase 4 is the warm path. AWS reuses the environment for subsequent invocations, skipping phases 1-3 entirely.
Cold path (first request to new env):
download -> init runtime -> init handler -> invoke
~100ms ~50-500ms your code your code
Warm path (next requests, same env):
invoke
your code only Hands-on Example
A handler that is slow on every cold start because the SDK client is created per invocation:
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';
export const handler = async (event) => {
const s3 = new S3Client({}); // BAD: rebuilt every warm invoke too
const data = await s3.send(new GetObjectCommand({
Bucket: 'reports', Key: event.key
}));
return { ok: true };
};
The cold-start-friendly version hoists init to module scope so it runs once per environment:
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';
const s3 = new S3Client({}); // one-time per env, reused on warm invokes
export const handler = async (event) => {
const data = await s3.send(new GetObjectCommand({
Bucket: 'reports', Key: event.key
}));
return { ok: true };
};
For Java functions with strict latency SLOs, enable SnapStart in your deployment:
Resources:
Api:
Type: AWS::Serverless::Function
Properties:
Runtime: java21
MemorySize: 1024
SnapStart:
ApplyOn: PublishedVersions
SnapStart snapshots the initialized JVM and restores it on cold start, typically cutting Java cold starts from 2-6 seconds down to a few hundred milliseconds.
For functions that must never cold start, provision concurrency:
aws lambda put-provisioned-concurrency-config \
--function-name api \
--qualifier prod \
--provisioned-concurrent-executions 10
This keeps 10 environments warm at all times. You pay for them whether they handle traffic or not.
Common Pitfalls
Heavy import graphs. A Node.js function importing the full aws-sdk v2 costs roughly 400 ms of init. Switch to modular v3 clients and tree-shake.
VPC-attached Lambdas pre-2019 myth. Hyperplane ENIs solved most VPC cold start pain. But the function still must reach AWS services either through VPC endpoints or a NAT — a misconfigured route adds seconds of timeout.
Reading config from SSM/Secrets Manager per invocation. Cache it at module scope, refresh on a TTL.
Big container images. Lambda supports up to 10 GB images, but bigger means slower download. Aim for under 250 MB by stripping build deps with a multi-stage Dockerfile.
Measuring with the console “Test” button. That always runs cold. Use real traffic plus CloudWatch’s Init Duration metric.
Practical Tips
Increase memory to reduce CPU-bound init time. Lambda allocates CPU proportionally to memory; 1769 MB gives you one full vCPU. Many Node and Python init times halve when you go from 512 MB to 1024 MB, and the per-millisecond cost increase is mostly offset by faster execution.
Use arm64 (Graviton) runtimes. They are 20 percent cheaper and often a touch faster on init.
For Python, prefer boto3.client at module scope and avoid heavy ML imports unless you need them. Lazy-import inside handlers for rarely-used code paths:
def handler(event, context):
if event.get('needsImage'):
from PIL import Image # only paid when needed
...
For Java, combine SnapStart with priming — invoke key code paths during init so classes are loaded:
static {
// prime expensive class loading
new ObjectMapper().writeValueAsString(Map.of("k","v"));
}
Use provisioned concurrency only for latency-critical endpoints, and pair it with auto-scaling so you do not pay for peak capacity all night.
Measure with the right percentile. Cold starts are tail latency, so look at p99 of Duration + InitDuration, not the mean.
Wrap-up
A cold start is download, runtime init, handler init, then your code. Everything you do at module scope happens once per environment; everything you do inside the handler happens every invocation. Shrink the import graph, hoist clients to module scope, lean on SnapStart for Java, and pull in provisioned concurrency only when a user-facing SLO demands it. Once you can sketch the four phases on a whiteboard, every Lambda performance question becomes a question of which phase you are paying for.
Related articles
- AWS AWS Step Functions Tutorial: Orchestrating Serverless Workflows
Learn how AWS Step Functions coordinate Lambda, ECS, and SDK calls into reliable state machines, with patterns for retries, parallelism, and error handling.
- AWS AWS Lambda Basics: Serverless Functions
A beginner-friendly tour of AWS Lambda — the handler signature, runtime choices, triggers from API Gateway and S3 and EventBridge, cold starts, packaging, and the IAM execution role every function needs.
- AWS AWS API Gateway vs ALB: Choosing the Right Entry Point
Compare API Gateway and Application Load Balancer for fronting AWS workloads, including features, pricing, latency, and when to use each in production.
- AWS AWS CloudFront CDN Tutorial: Caching at the Edge
Learn how AWS CloudFront accelerates content delivery, what cache behaviors look like, and how to wire it up to an S3 origin with sensible defaults.