How to Use CloudWatch Logs Insights to Search Lambda Error Logs

When a Lambda function starts throwing errors in production, every minute of ambiguity is expensive. CloudWatch Logs Insights lets you run structured queries across all log streams in a log group simultaneously — no manual stream-by-stream hunting, no grep over downloaded files.

TL;DR: CloudWatch Logs Insights for Lambda Error Logs

GoalApproach
Find all ERROR lines fastQuery with filter @message like /ERROR/
Scope to a time windowSet absolute or relative time range in the console or CLI
Identify the most frequent errorsUse stats count(*) by @message to aggregate
Correlate to a specific invocationFilter on @requestId field
Automate from CI/CD or runbookUse aws logs start-query + get-query-results CLI

How CloudWatch Logs Insights Works with Lambda

Lambda automatically publishes invocation logs to a CloudWatch log group named /aws/lambda/<function-name>. Each concurrent execution writes to its own log stream, which means a high-concurrency function can have hundreds of active streams. Logs Insights indexes all streams in a log group and executes queries in parallel across them — the query engine scans ingested log data and returns results without requiring you to know which stream holds the event you're looking for. The query language exposes auto-extracted fields like @timestamp, @message, @logStream, and @requestId, which are available on every Lambda log event without any custom log formatting.

graph LR A["Lambda Invocation"] --> B["Log Stream A"] A2["Lambda Invocation
(concurrent)"] --> C["Log Stream B"] B --> D["Log Group
/aws/lambda/fn"] C --> D D --> E["Logs Insights
Query Engine"] E --> F["Result Set
@timestamp, @message,
@requestId, @logStream"]
  1. Lambda execution — each invocation writes START, log lines, and END/REPORT records to its assigned log stream.
  2. Log group — all streams for a function share one log group; Insights queries target the group, not individual streams.
  3. Insights query engine — scans indexed log data across all streams in parallel for the selected time range.
  4. Result set — returns matched records with auto-extracted fields, sortable and exportable.

Step 1: Open Logs Insights and Select the Lambda Log Group

Navigate to CloudWatch → Logs Insights in the AWS console. In the log group selector at the top, type /aws/lambda/ and choose your function's log group. You can select multiple log groups if you need to query across several functions at once — useful when tracing an error that crosses a Lambda-to-Lambda call chain.

Set the time range to cover the incident window. Logs Insights bills per GB of data scanned, so narrowing the time range is both faster and cheaper. Start with the last 1 hour and widen only if needed.

Step 2: Write the Basic ERROR Filter Query

The simplest query that surfaces every log line containing the word ERROR:

fields @timestamp, @message, @logStream, @requestId
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50

What each clause does:

  • fields — selects which columns appear in results. @logStream tells you which execution produced the error; @requestId maps to the Lambda request ID in the REPORT line.
  • filter @message like /ERROR/ — case-sensitive regex match. Use /(?i)error/ if your logs mix casing.
  • sort @timestamp desc — most recent errors first.
  • limit 50 — Logs Insights returns up to 10,000 records; set this to a manageable number for interactive use.

Step 3: Aggregate to Find the Most Frequent Errors

A raw list of error lines tells you errors exist. Aggregation tells you which error is dominating — that's the one to fix first.

filter @message like /ERROR/
| stats count(*) as errorCount by @message
| sort errorCount desc
| limit 20

If your application logs structured JSON (e.g., {"level":"ERROR","msg":"connection timeout"}), Logs Insights auto-parses JSON fields. You can then filter and group on the parsed fields directly:

filter level = "ERROR"
| stats count(*) as errorCount by msg
| sort errorCount desc
| limit 20

This only works when the log line is valid JSON. If your Lambda uses a logging library that emits structured JSON, this approach gives you cleaner aggregation than regex matching against the raw message string.

Step 4: Correlate an Error to a Specific Invocation

Once you identify an error message, trace it back to a single invocation to see the full execution context. Lambda's REPORT line and your application logs share the same @requestId within a log stream.

fields @timestamp, @message
| filter @requestId = "your-request-id-here"
| sort @timestamp asc

This returns every log line from that invocation in chronological order — from the START record through your application logs to the END and REPORT lines. The REPORT line includes billed duration, memory used, and whether the invocation was a cold start (Init Duration field present).

Think of @requestId as a distributed trace ID that Lambda injects for free. Every log line from a single invocation shares it, so you can reconstruct the full execution timeline without adding any instrumentation to your function code.

Step 5: Run the Query from the CLI (for Runbooks and Automation)

Interactive console queries are fine for ad-hoc investigation. For runbooks, incident response scripts, or CI/CD pipelines, use the CLI. Logs Insights queries are asynchronous — you start a query, poll for completion, then retrieve results.

Start the query and capture the query ID:

aws logs start-query \
  --log-group-name "/aws/lambda/my-function-name" \
  --start-time $(date -d '1 hour ago' +%s) \
  --end-time $(date +%s) \
  --query-string 'fields @timestamp, @message, @requestId | filter @message like /ERROR/ | sort @timestamp desc | limit 50' \
  --region us-east-1

The response contains a queryId. Poll until status is Complete:

aws logs get-query-results \
  --query-id "your-query-id-here" \
  --region us-east-1

The status field in the response will be Running, Complete, Failed, or Cancelled. In a shell script, loop on get-query-results until status is Complete before processing the results array.

The IAM principal running these commands needs the following minimum permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:StartQuery",
        "logs:GetQueryResults",
        "logs:StopQuery"
      ],
      "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/my-function-name:*"
    }
  ]
}

logs:DescribeLogGroups is required if the principal also needs to list log groups. Read and List actions on CloudWatch Logs may require "Resource": "*" for some operations — verify against the AWS Service Authorization Reference for your specific actions.

Step 6: Diagnose the Pattern — Symptom, Misdiagnosis, Actual Cause

Here's a failure pattern that appears regularly in production Lambda debugging.

Symptom: The query filter @message like /ERROR/ returns zero results, but the function's error rate metric in CloudWatch Metrics is clearly elevated.

Misdiagnosis: The first instinct is to assume the log group name is wrong or the time range is off. Engineers widen the time range, check the log group, and re-run — still nothing.

Actual cause: Lambda's built-in error handling for unhandled exceptions emits the error to the REPORT line and sets the invocation status, but the exception message itself may appear as a JSON-structured object in @message — not as a plain string containing the word "ERROR". For example, a Python unhandled exception produces a log event like {"errorMessage": "...", "errorType": "ValueError", ...}. The word "ERROR" never appears in that string.

Fix: Broaden the filter to catch the actual structure your runtime emits:

fields @timestamp, @message, @requestId
| filter @message like /errorMessage/ or @message like /ERROR/ or @message like /Task timed out/
| sort @timestamp desc
| limit 50

Alternatively, if your function logs structured JSON, parse the errorType field directly:

fields @timestamp, errorMessage, errorType, @requestId
| filter ispresent(errorType)
| sort @timestamp desc
| limit 50

The ispresent() function returns true when the field exists in the parsed log event — it catches any unhandled exception regardless of the error type string.

graph TD INV["Lambda Invocation Fails"] --> P1["Unhandled Exception
(runtime serializes as JSON)"] INV --> P2["Application logger.error()
(plain text or JSON)"] INV --> P3["Timeout
Task timed out after X seconds"] P1 --> Q1["filter ispresent(errorType)
or like /errorMessage/"] P2 --> Q2["filter @message like /ERROR/"] P3 --> Q3["filter @message like /Task timed out/"] Q1 --> R["Combined OR filter
covers all paths"] Q2 --> R Q3 --> R
  1. Unhandled exception path — runtime serializes the exception as a JSON object; the word "ERROR" may not appear in @message.
  2. Application log path — your logger.error() calls produce lines where "ERROR" does appear, depending on your logging library's format.
  3. Timeout path — Lambda emits a specific string; filter on it explicitly.
  4. Correct query strategy — cover all three paths with an or filter or use ispresent(errorType) for JSON-structured runtime errors.

Wrap-Up and Next Steps for Lambda Error Log Analysis

CloudWatch Logs Insights eliminates the stream-by-stream search problem for Lambda errors. The core workflow is: select the log group, filter on the error signal that matches your runtime's actual output format, aggregate to find the dominant error, then drill into a specific @requestId for full invocation context.

Key points to carry forward:

  • Match your filter to your runtime's error format — Python unhandled exceptions emit JSON, not plain "ERROR" strings.
  • Use stats count(*) by @message to prioritize which error to fix first under incident pressure.
  • Automate with start-query + get-query-results for runbooks; the async model requires polling.
  • Narrow the time range before running — it reduces scan cost and query latency.

For deeper observability, consider pairing Logs Insights with AWS X-Ray tracing to correlate errors with downstream service latency, and review the official Logs Insights query syntax reference for the full set of supported functions.

Glossary

TermDefinition
Log GroupA named container for log streams in CloudWatch Logs. Lambda creates one per function at /aws/lambda/<function-name>.
Log StreamA sequence of log events from a single Lambda execution environment. Multiple streams exist per function when concurrency is greater than one.
@requestIdAn auto-extracted field in Logs Insights containing the Lambda request ID, shared across all log lines from a single invocation.
ispresent()A Logs Insights function that returns true when a specified field exists in the parsed log event.
Query IDA unique identifier returned by start-query used to poll for results via get-query-results.

Related Posts

Comments

Popular posts from this blog

EC2 No Internet Access in Custom VPC: Fix Internet Gateway and Route Table

EC2 SSH Connection Timeout: Which Security Group Rules to Check

Difference Between IAM User and IAM Role: Which One Should Your EC2 Use?