Eliminating Lambda Cold Starts: A Deep Dive into Provisioned Concurrency & SnapStart

Your Lambda function responds in milliseconds on every subsequent call — but that first invocation after a period of inactivity takes 2–3 seconds, silently breaking SLAs and degrading user experience. This is the cold start problem, and understanding its root cause is the prerequisite to fixing it correctly.

TL;DR

Concept	What It Is	Best For	Key Trade-off
Cold Start	Latency from bootstrapping a new execution environment	Understanding the problem	Unavoidable without mitigation
Provisioned Concurrency	Pre-initializes N execution environments, keeping them warm	All runtimes; latency-sensitive APIs	Billed even when idle
SnapStart	Snapshots the initialized environment; restores from snapshot	Java (Corretto 11, 17, 21) runtimes only	Restore latency; uniqueness considerations

What Exactly Is a Cold Start?

AWS Lambda runs your code inside a Micro VM (MicroVM) managed by Firecracker. When no warm execution environment exists for your function, Lambda must perform a full bootstrap sequence before your handler even begins executing. This sequence has two distinct phases:

Platform Init: AWS provisions the MicroVM, downloads your deployment package or container image, and starts the runtime process (JVM, Node.js, Python interpreter, etc.).
Function Init: Your initialization code outside the handler runs — importing libraries, establishing DB connections, loading ML models, etc.

Only after both phases complete does your handler receive the event. The combined duration is the cold start latency you observe.

graph TD A["Invoke Request"] --> B{"Warm Environment Available?"} B -- "Yes (Warm)" --> G["Execute Handler"] B -- "No (Cold Start)" --> C["Platform Init (MicroVM + Runtime Bootstrap)"] C --> D["Function Init (Your init code outside handler)"] D --> G G --> H["Return Response"] style C fill:#ffcccc,stroke:#cc0000 style D fill:#ffcccc,stroke:#cc0000 style G fill:#ccffcc,stroke:#006600

Invoke Request: An event triggers the Lambda function (API Gateway, EventBridge, etc.).
Environment Check: Lambda's control plane checks for an available warm execution environment.
Cold Path (red): If none exists, Platform Init + Function Init must complete before the handler runs — this is the cold start penalty.
Warm Path (green): A reused environment skips both init phases and goes directly to handler execution.
Response: Handler result is returned to the caller.

Why Java Is the Worst Offender

Cold start duration is heavily influenced by runtime startup time and initialization code complexity. The JVM's class-loading and JIT compilation make Java functions notorious for 2–5 second cold starts, while Python and Node.js typically see 100–500ms. This is precisely why AWS built SnapStart specifically for Java runtimes.

Runtime	Typical Cold Start Range	Primary Driver
Java (Corretto)	1,000ms – 5,000ms+	JVM startup + class loading
Python	100ms – 700ms	Interpreter + package imports
Node.js	100ms – 500ms	V8 engine + module loading
Go (provided.al2)	50ms – 200ms	Binary startup (compiled)

Note: These are representative ranges. Actual values depend on deployment package size, VPC configuration, and initialization code complexity. Always measure your specific function.

Solution 1: Provisioned Concurrency

Provisioned Concurrency instructs Lambda to pre-initialize and keep a specified number of execution environments in a ready state. These environments have already completed both Platform Init and Function Init. When an invocation arrives, it is dispatched to a pre-warmed environment with zero init overhead.

graph LR subgraph "Deployment Time" PC["put-provisioned-concurrency-config (N=10)"] end subgraph "Lambda Control Plane" PC --> E1["Env 1 (Pre-warmed)"] PC --> E2["Env 2 (Pre-warmed)"] PC --> E3["Env 3...N (Pre-warmed)"] end subgraph "Invocation Time" R1["Request 1"] --> E1 R2["Request 2"] --> E2 R3["Request N+1 (overflow)"] --> E4["New Env (Cold Start!)"] end style E1 fill:#ccffcc,stroke:#006600 style E2 fill:#ccffcc,stroke:#006600 style E3 fill:#ccffcc,stroke:#006600 style E4 fill:#ffcccc,stroke:#cc0000

Configuration: You set Provisioned Concurrency on a specific function version or alias (not $LATEST).
Pre-warming: Lambda proactively initializes the specified number of environments, running your init code.
Invocation Routing: Incoming requests are routed to pre-warmed environments first.
Overflow: If requests exceed provisioned capacity, Lambda spins up additional on-demand environments — these will cold start.
Billing: You are billed for provisioned concurrency hours regardless of invocation volume.

Configuring Provisioned Concurrency (AWS CLI)

🔽 [Click to expand] — CLI: Publish version & set Provisioned Concurrency

# Step 1: Publish an immutable version (required — cannot use $LATEST)
aws lambda publish-version \
  --function-name my-api-function \
  --description "v1 - production release"

# Step 2: Set Provisioned Concurrency on the published version
# Replace '1' with your published version number from Step 1 output
aws lambda put-provisioned-concurrency-config \
  --function-name my-api-function \
  --qualifier 1 \
  --provisioned-concurrent-executions 10

# Step 3: Poll until status is READY (not IN_PROGRESS)
aws lambda get-provisioned-concurrency-config \
  --function-name my-api-function \
  --qualifier 1

Auto-Scaling Provisioned Concurrency

Keeping a fixed number of warm environments wastes money during off-peak hours. Use Application Auto Scaling to scale provisioned concurrency based on a schedule or utilization metric.

🔽 [Click to expand] — CLI: Register scalable target & attach scheduled scaling

# Register the Lambda function version as a scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:my-api-function:1 \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --min-capacity 2 \
  --max-capacity 50

# Scale UP at 08:00 UTC (business hours start)
aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:my-api-function:1 \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name scale-up-morning \
  --schedule "cron(0 8 * * ? *)" \
  --scalable-target-action MinCapacity=10,MaxCapacity=50

# Scale DOWN at 20:00 UTC (off-peak)
aws application-autoscaling put-scheduled-action \
  --service-namespace lambda \
  --resource-id function:my-api-function:1 \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --scheduled-action-name scale-down-evening \
  --schedule "cron(0 20 * * ? *)" \
  --scalable-target-action MinCapacity=2,MaxCapacity=10

Solution 2: Lambda SnapStart (Java Only)

SnapStart takes a fundamentally different approach. Instead of keeping environments perpetually warm, it snapshots the initialized state of the execution environment after Function Init completes, then restores from that snapshot on subsequent cold starts. The expensive JVM startup and class-loading happens once at deployment time, not at invocation time.

sequenceDiagram participant Dev as "Developer" participant Lambda as "Lambda Service" participant MicroVM as "Firecracker MicroVM" participant Cache as "Snapshot Cache" Note over Dev,Cache: --- Deployment Phase (publish-version) --- Dev->>Lambda: publish-version (SnapStart enabled) Lambda->>MicroVM: Bootstrap JVM + Run Function Init MicroVM->>MicroVM: beforeCheckpoint() hook MicroVM->>Cache: Take & store encrypted snapshot Lambda-->>Dev: Version published Note over Dev,Cache: --- Cold Start Invocation (later) --- Dev->>Lambda: Invoke Request Lambda->>Cache: Restore snapshot Cache->>MicroVM: Restore MicroVM state MicroVM->>MicroVM: afterRestore() hook MicroVM->>Lambda: Handler ready Lambda-->>Dev: Response (fast!)

Publish Version: When you publish a new function version with SnapStart enabled, Lambda runs Function Init once.
Snapshot: Lambda takes a memory and disk snapshot of the fully initialized Firecracker MicroVM.
Cache: The snapshot is encrypted and cached in a tiered storage layer managed by AWS.
Restore: On a cold start, Lambda restores from the snapshot instead of re-running init — dramatically reducing latency.
Hook Execution: beforeCheckpoint and afterRestore lifecycle hooks allow you to handle state that must be refreshed (e.g., re-establishing DB connections, re-seeding random number generators).

Enabling SnapStart (AWS CLI)

🔽 [Click to expand] — CLI: Enable SnapStart on a Java function

# Update function configuration to enable SnapStart
# Supported runtimes: java11, java17, java21
aws lambda update-function-configuration \
  --function-name my-java-api-function \
  --snap-start ApplyOn=PublishedVersions

# Publish a new version — snapshot is taken at this point
aws lambda publish-version \
  --function-name my-java-api-function \
  --description "SnapStart enabled - v2"

Implementing Lifecycle Hooks in Java

SnapStart snapshots state — which means any state that must be unique per environment (random seeds, timestamps, open network connections) must be handled in the afterRestore hook.

🔽 [Click to expand] — Java: SnapStart lifecycle hook implementation

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import org.crac.Core;
import org.crac.Resource;

public class MyHandler implements RequestHandler<MyEvent, MyResponse>, Resource {

    private DatabaseConnection dbConnection;

    public MyHandler() {
        // This runs during Function Init (before snapshot)
        // Safe: load configs, initialize static data, warm up classes
        Core.getGlobalContext().register(this);
        System.out.println("Init: Loading static configuration...");
    }

    @Override
    public void beforeCheckpoint(org.crac.Context<? extends Resource> context) {
        // Called BEFORE the snapshot is taken
        // Close any connections that should NOT be snapshotted
        if (dbConnection != null) {
            dbConnection.close();
            dbConnection = null;
        }
        System.out.println("beforeCheckpoint: Closed DB connection before snapshot.");
    }

    @Override
    public void afterRestore(org.crac.Context<? extends Resource> context) {
        // Called AFTER restore from snapshot, BEFORE handler invocation
        // Re-establish connections, re-seed randomness, refresh tokens
        this.dbConnection = DatabaseConnection.create();
        System.out.println("afterRestore: Re-established DB connection after restore.");
    }

    @Override
    public MyResponse handleRequest(MyEvent event, Context context) {
        // Handler runs with a fully restored, connection-ready environment
        return dbConnection.query(event.getId());
    }
}

Choosing the Right Solution

Analogy: Think of Provisioned Concurrency as keeping N taxis idling at the airport rank 24/7 — always ready, but burning fuel constantly. SnapStart is like a taxi that can be flash-frozen mid-shift and instantly thawed when a passenger appears — you pay for the freeze once, not for continuous idling.

graph TD Start(["Cold Start Problem"]) Start --> Q1{"Is runtime Java (Corretto 11/17/21)?"} Q1 -- "Yes" --> Q2{"Can you tolerate lifecycle hook complexity?"} Q2 -- "Yes" --> SS["Use SnapStart (free, scales to zero)"] Q2 -- "No" --> PC Q1 -- "No" --> Q3{"Is traffic pattern predictable / bursty?"} Q3 -- "Predictable" --> PCS["Provisioned Concurrency + Scheduled Auto Scaling"] Q3 -- "Unpredictable" --> Q4{"Cost sensitivity?"} Q4 -- "Cost-sensitive" --> OPT["Optimize init code + Increase memory + Reduce package size"] Q4 -- "Latency-critical" --> PC["Provisioned Concurrency + Target Tracking Scaling"] style SS fill:#ccffcc,stroke:#006600 style PC fill:#cce5ff,stroke:#0066cc style PCS fill:#cce5ff,stroke:#0066cc style OPT fill:#fff3cd,stroke:#856404

Criteria	Provisioned Concurrency	SnapStart
Runtime Support	All Lambda runtimes	Java (Corretto 11, 17, 21) only
Cold Start Elimination	Complete (for provisioned capacity)	Significant reduction (not always zero)
Cost Model	Billed per provisioned concurrency-hour	No additional charge beyond standard Lambda pricing
Scales to Zero	No (provisioned environments always running)	Yes
State Complexity	None (standard init)	Requires lifecycle hook management
Deployment Trigger	Manual or auto-scaling configuration	Automatic on `publish-version`

Additional Optimizations (Runtime-Agnostic)

Provisioned Concurrency and SnapStart address the platform layer, but your Function Init code is equally important:

Minimize deployment package size: Smaller packages download faster. Use Lambda Layers for shared dependencies. Avoid bundling unused libraries.
Lazy initialization: Defer expensive object creation to the first handler invocation if the resource is not always needed.
Avoid VPC unless necessary: Lambda functions inside a VPC historically had higher cold start latency due to ENI attachment. AWS has significantly improved this with Hyperplane ENIs, but VPC still adds overhead. Only attach to a VPC if your function genuinely requires private resource access.
Use ARM64 (Graviton2): Graviton2-based Lambda functions can offer better price-performance and in some cases lower cold start times compared to x86_64 for the same workload.
Increase memory allocation: Lambda allocates CPU proportionally to memory. More memory means faster initialization code execution, which reduces Function Init duration.

Measuring Cold Starts with CloudWatch

Before optimizing, measure. Lambda reports initialization duration in CloudWatch Logs Insights. Use the following query to identify cold start frequency and duration:

🔽 [Click to expand] — CloudWatch Logs Insights: Cold start analysis query

-- Run this in CloudWatch Logs Insights against your Lambda log group
-- e.g., /aws/lambda/my-api-function

filter @type = "REPORT"
| parse @message "Init Duration: * ms" as initDuration
| filter ispresent(initDuration)
| stats
    count() as coldStartCount,
    avg(initDuration) as avgInitMs,
    max(initDuration) as maxInitMs,
    pct(initDuration, 95) as p95InitMs,
    pct(initDuration, 99) as p99InitMs
  by bin(1h)

The Init Duration field only appears in REPORT log lines for cold start invocations. A high coldStartCount relative to total invocations indicates your function is not retaining warm environments — a signal to consider Provisioned Concurrency or traffic pattern analysis.

Glossary

Term	Definition
Execution Environment	The isolated Firecracker MicroVM that hosts a single concurrent Lambda invocation. Reused across warm invocations.
Provisioned Concurrency	A Lambda feature that pre-initializes a set number of execution environments, eliminating cold starts for that capacity.
SnapStart	A Lambda feature for Java runtimes that snapshots the post-init execution environment and restores from it on cold starts.
Init Duration	The time Lambda spent on Function Init (your initialization code) during a cold start, reported in CloudWatch REPORT logs.
CRaC (Coordinated Restore at Checkpoint)	The OpenJDK project API used by SnapStart lifecycle hooks (`beforeCheckpoint`, `afterRestore`) to manage stateful resources across snapshots.

Next Steps

📖 Official Docs: Lambda Provisioned Concurrency | Lambda SnapStart
🔬 Measure first: Run the CloudWatch Logs Insights query above to quantify your cold start rate before choosing a solution.
💰 Cost model: Use the AWS Lambda Pricing page to model Provisioned Concurrency costs against your traffic patterns before committing.
🏗️ IaC: Manage Provisioned Concurrency and SnapStart via AWS SAM (ProvisionedConcurrencyConfig and SnapStart properties) or Terraform (aws_lambda_provisioned_concurrency_config resource) for repeatable deployments.

Search This Blog

SW BBANG