How to Reduce Lambda Cold Start Time: Provisioned Concurrency & SnapStart Explained
Lambda cold starts are one of the most common performance complaints in serverless architectures — your function responds in 20ms on every warm invocation, then suddenly takes 2–3 seconds on the first call after a period of inactivity. Understanding what actually happens during that initialization window, and which AWS-native tools eliminate it, is the difference between a serverless app that feels snappy and one that frustrates users on every deploy.
TL;DR: Lambda Cold Start Reduction Options
| Approach | Best For | Eliminates Cold Start? | Extra Cost? |
|---|---|---|---|
| Provisioned Concurrency | Any runtime, latency-sensitive APIs | Yes (pre-warmed) | Yes — charged per hour |
| SnapStart (Java) | Java 11+ functions with heavy init | Near-zero init latency | No additional charge for snapshots |
| Minimize deployment package | All runtimes | Reduces, does not eliminate | No |
| Move work out of handler init | All runtimes | Reduces, does not eliminate | No |
Pricing and limits vary — always check the official AWS documentation.
How Lambda Cold Starts Work
When Lambda receives an invocation and no warm execution environment exists, it must provision a new one from scratch. This lifecycle has three distinct phases before your handler code even runs: environment provisioning (allocating compute, downloading your deployment package), runtime initialization (starting the language runtime), and function initialization (executing code outside your handler — SDK clients, DB connections, static config). The cold start duration your users observe is the sum of all three. Your handler's own execution time is separate and unaffected.
- No warm environment available — Lambda detects no idle execution environment can serve the invocation.
- Environment provisioning — AWS allocates compute capacity and downloads your deployment package from internal storage.
- Runtime init — The language runtime (JVM, Node.js process, Python interpreter) starts up.
- Function init — Code outside your handler executes: SDK client construction, DB connection pools, config loading.
- Handler invocation — Your actual business logic runs. This is the only phase that runs on every invocation.
- Environment stays warm — The environment is reused for subsequent invocations, skipping steps 1–4.
Why Cold Starts Hurt More Than You Expect
The most common misdiagnosis: engineers see a 2-second cold start and immediately blame their handler code. They profile the function, optimize database queries, and reduce package size — then discover the latency is unchanged. The actual cause is almost always the function initialization phase: a heavyweight SDK client being constructed, a secrets manager call happening at module load time, or a Java application context being bootstrapped. The handler itself may run in 30ms; the init phase is where the time goes.
Think of a cold start like opening a restaurant that was closed overnight. The food prep (your handler) is fast. But unlocking the doors, turning on the equipment, and briefing the staff (runtime + function init) takes time regardless of how simple the first order is.
There's a subtler interaction worth knowing: Lambda's function initialization phase runs under a separate timeout from your handler. If your init code exceeds the configured function timeout, Lambda reports a timeout error — but the CloudWatch log will show the error occurring before the START RequestId line for the handler, which is a reliable signal that init code, not handler code, is the culprit.
Diagnosing Your Cold Start with CloudWatch
Before reaching for Provisioned Concurrency or SnapStart, measure what you're actually dealing with. Lambda emits an Init Duration field in the REPORT log line — but only for cold start invocations. Warm invocations omit this field entirely, which makes it a precise filter.
# Find cold start REPORT lines in the last hour
aws logs filter-log-events \
--log-group-name '/aws/lambda/your-function-name' \
--start-time $(date -d '1 hour ago' +%s000) \
--filter-pattern 'REPORT Init Duration' \
--query 'events[*].message' \
--output text
The output will look like:
REPORT RequestId: abc-123 Duration: 45.23 ms Billed Duration: 46 ms Memory Size: 512 MB Max Memory Used: 210 MB Init Duration: 2341.87 ms
That Init Duration: 2341.87 ms is the cold start tax. Your handler ran in 45ms. Everything before it cost 2.3 seconds. Now you have a concrete number to optimize against — and a baseline to verify that any fix actually worked.
Solution 1: Provisioned Concurrency (All Runtimes)
Provisioned Concurrency instructs Lambda to pre-initialize a specified number of execution environments and keep them perpetually warm. Invocations routed to these environments skip the environment provisioning, runtime init, and function init phases entirely — they go straight to handler execution. This is the most direct solution and works for every Lambda runtime.
Enabling Provisioned Concurrency on a Function Version
Provisioned Concurrency must be applied to a published function version or an alias — not to $LATEST. This is a hard constraint. The workflow is: publish a version, then configure concurrency against it.
# Step 1: Publish a new version of your function
aws lambda publish-version \
--function-name your-function-name \
--description 'v1 for provisioned concurrency'
# Step 2: Apply provisioned concurrency to that version
# Replace 1 with the version number returned by publish-version
aws lambda put-provisioned-concurrency-config \
--function-name your-function-name \
--qualifier 1 \
--provisioned-concurrent-executions 5
# Step 3: Verify the status (wait for 'READY')
aws lambda get-provisioned-concurrency-config \
--function-name your-function-name \
--qualifier 1
The Status field in the response will transition from IN_PROGRESS to READY. Only after it reaches READY are the environments actually pre-warmed. Invoking the function during IN_PROGRESS can still produce cold starts.
Auto-Scaling Provisioned Concurrency
Static provisioned concurrency wastes money during off-peak hours. Application Auto Scaling can adjust the count on a schedule or based on utilization. The target tracking policy uses the LambdaProvisionedConcurrencyUtilization metric — when utilization exceeds your target, Auto Scaling increases the provisioned count.
🔽 Click to expand: Auto Scaling configuration
# Register the Lambda alias as a scalable target
aws application-autoscaling register-scalable-target \
--service-namespace lambda \
--resource-id 'function:your-function-name:prod' \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--min-capacity 2 \
--max-capacity 20
# Create a target tracking policy at 70% utilization
aws application-autoscaling put-scaling-policy \
--service-namespace lambda \
--resource-id 'function:your-function-name:prod' \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--policy-name pc-tracking-policy \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 0.7,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"
}
}'
Required IAM Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lambda:PutProvisionedConcurrencyConfig",
"lambda:GetProvisionedConcurrencyConfig",
"lambda:DeleteProvisionedConcurrencyConfig",
"lambda:PublishVersion"
],
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:your-function-name"
},
{
"Effect": "Allow",
"Action": [
"application-autoscaling:RegisterScalableTarget",
"application-autoscaling:PutScalingPolicy"
],
"Resource": "*"
}
]
}
Solution 2: Lambda SnapStart (Java Runtime)
If your function runs on Java 11 or later (including Java 17 and Java 21 managed runtimes), SnapStart is the more cost-effective path. Instead of keeping environments perpetually warm, Lambda takes a snapshot of the initialized execution environment — memory and disk state — after the function init phase completes. On subsequent cold starts, Lambda restores from this snapshot rather than re-running initialization. The result is that the expensive JVM startup and application context initialization happen once at snapshot time, not on every cold start.
The critical operational detail: SnapStart snapshots are taken when you publish a function version. If your init code has side effects that must not be replayed (opening a network connection, generating a unique ID at startup), you must use the beforeCheckpoint and afterRestore lifecycle hooks from the aws-lambda-snapstart-java library to handle teardown and re-initialization correctly.
Enabling SnapStart
# Enable SnapStart on the function (applies to newly published versions)
aws lambda update-function-configuration \
--function-name your-function-name \
--snap-start ApplyOn=PublishedVersions
# Publish a version — this triggers snapshot creation
aws lambda publish-version \
--function-name your-function-name \
--description 'SnapStart enabled version'
# Verify SnapStart status on the published version
aws lambda get-function-configuration \
--function-name your-function-name \
--qualifier 2
In the response, look for the SnapStart block:
"SnapStart": {
"ApplyOn": "PublishedVersions",
"OptimizationStatus": "On"
}
OptimizationStatus: On confirms the snapshot was successfully created for that version. SnapStart is not available on $LATEST — only on published versions, consistent with Provisioned Concurrency's version requirement.
SnapStart Lifecycle Hook (Java)
If your initialization code opens connections or generates state that must be refreshed after restore, implement the RuntimeHook interface:
import com.amazonaws.services.lambda.runtime.api.client.runtimeapi.dto.SnapStartRequest;
import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;
public class MyHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>, Resource {
public MyHandler() {
// Expensive init: build application context, load config
Core.getGlobalContext().register(this);
}
@Override
public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
// Close connections before snapshot is taken
dbConnectionPool.close();
}
@Override
public void afterRestore(Context<? extends Resource> context) throws Exception {
// Re-open connections after environment is restored from snapshot
dbConnectionPool.initialize();
}
}
- First publish — Lambda runs full init (JVM start + application context) and takes a memory/disk snapshot.
- Snapshot stored — The initialized state is persisted internally by Lambda.
- Cold start invocation — Lambda restores from snapshot, skipping JVM startup and application init.
- afterRestore hook — Your code re-establishes any connections that were closed before the checkpoint.
- Handler runs — Business logic executes with a fully initialized environment.
Code-Level Optimizations (All Runtimes)
Provisioned Concurrency and SnapStart address the infrastructure layer. These code-level changes reduce the function init phase duration, which matters both for unprovisioned cold starts and for the snapshot creation time SnapStart measures.
- Lazy-initialize SDK clients — Only construct clients you'll actually use on this invocation path. A function that handles 5 event types shouldn't initialize all 5 SDK clients on every cold start.
- Move Secrets Manager calls out of init — Fetching secrets at module load time adds a synchronous network call to every cold start. Cache the secret in a module-level variable and refresh it on a TTL basis inside the handler.
- Reduce deployment package size — Smaller packages download faster during environment provisioning. Use Lambda Layers to separate large dependencies from your function code, and strip unused dependencies from your build.
- Choose a lighter runtime for latency-critical paths — If Java's JVM startup is the bottleneck and SnapStart isn't viable, Node.js or Python runtimes have structurally shorter init phases for equivalent workloads.
Experience Signal: The Secrets Manager Trap
A common pattern that causes exactly the 2–3 second cold start described in the intro: a Node.js function that calls secretsmanager:GetSecretValue at module load time to populate a database connection string. The engineer sees 2.4 seconds of Init Duration in CloudWatch, assumes it's the database connection pool, and spends time optimizing connection settings. The pool initialization takes 80ms. The Secrets Manager call — a synchronous HTTPS request happening before the handler even starts — accounts for the remaining 2.3 seconds.
The fix is not to remove the Secrets Manager call. The fix is to move it inside the handler with a module-level cache check: if the secret is already loaded, skip the call. The first cold start still pays the cost once, but subsequent warm invocations and re-used environments skip it entirely. Pair this with Provisioned Concurrency and the init cost is paid exactly once per pre-warmed environment, not on every user-facing request.
Init Duration in CloudWatch is your ground truth. If it's high, the problem is in your initialization code — not your handler. Profile them separately.
Choosing the Right Approach for Your Lambda Cold Start
Wrap-Up & Next Steps: Reducing Lambda Cold Start Time
Cold start latency in Lambda is a layered problem. The Init Duration metric in CloudWatch tells you exactly how much time is spent before your handler runs — start there before applying any fix. For latency-sensitive APIs on any runtime, Provisioned Concurrency eliminates cold starts by keeping environments pre-warmed, at the cost of per-hour charges. For Java functions with heavy initialization, SnapStart achieves near-zero init latency by restoring from a snapshot rather than re-running the JVM startup sequence. Code-level changes — lazy initialization, deferred secrets loading, smaller packages — reduce the baseline init cost and compound with either infrastructure solution.
- Measure first: Lambda CloudWatch metrics reference
- Provisioned Concurrency: AWS Lambda Provisioned Concurrency documentation
- SnapStart: AWS Lambda SnapStart documentation
- Auto Scaling integration: Managing Provisioned Concurrency with Application Auto Scaling
Glossary
| Term | Definition |
|---|---|
| Cold Start | The latency added when Lambda must provision a new execution environment before running your handler. Measured by Init Duration in CloudWatch REPORT logs. |
| Provisioned Concurrency | A Lambda feature that pre-initializes a specified number of execution environments, eliminating cold start latency for those environments at an hourly cost. |
| SnapStart | A Lambda optimization for Java runtimes that snapshots the initialized execution environment after function init, restoring from that snapshot on subsequent cold starts instead of re-running initialization. |
| Function Init Phase | The execution of code outside your handler function — SDK client construction, DB connections, config loading — that runs once per execution environment lifecycle. |
| Warm Invocation | An invocation served by an already-initialized execution environment. The function init phase is skipped; only handler code runs. |
Comments
Post a Comment