Understanding SQS Visibility Timeout: Why Your Messages Are Being Processed Twice

SQS visibility timeout controls how long a message stays hidden from other consumers after being received — misconfiguring it is the most common cause of duplicate message processing in production queues.

TL;DR: SQS Visibility Timeout at a Glance

Scenario What Happens Root Cause Fix
Message processed twice Two consumers handle the same message Visibility timeout shorter than processing time Increase timeout or call ChangeMessageVisibility
Lambda duplicate invocations Function retried before completion Queue timeout less than 6x Lambda timeout Set queue timeout to at least 6x Lambda timeout
Message stuck in flight Never requeued after consumer crash Timeout too long, no delete on failure Tune timeout + configure DLQ
Message never retried Failed processing silently dropped Consumer deletes before confirming success Delete only after successful processing

What Is SQS Visibility Timeout and Why It Causes Duplicate Processing

When a consumer calls ReceiveMessage, SQS does not remove the message from the queue. Instead, it hides the message from all other consumers for a configurable duration — the visibility timeout. If the consumer deletes the message before the timeout expires, the message is gone. If it does not — whether due to a crash, slow processing, or a bug — the timeout expires and the message becomes visible again, available for any consumer to pick up.

Duplicate processing is not a bug in SQS. It is the designed behavior of an at-least-once delivery system when the visibility timeout is shorter than actual processing time.

sequenceDiagram participant SQS participant ConsumerA participant ConsumerB ConsumerA->>SQS: ReceiveMessage SQS-->>ConsumerA: Message + receipt handle Note over SQS: Visibility timeout starts Note over ConsumerA: Processing takes too long Note over SQS: Timeout expires - message visible again ConsumerB->>SQS: ReceiveMessage SQS-->>ConsumerB: Same message ConsumerA->>SQS: DeleteMessage Note over ConsumerB: Already processing - duplicate!
  1. Consumer A receives the message — SQS hides it for the configured visibility timeout duration.
  2. Timeout expires before Consumer A finishes — SQS makes the message visible again.
  3. Consumer B receives the same message — both consumers now process it concurrently.
  4. Consumer A eventually deletes the message — but Consumer B has already started work, causing a duplicate.

How the Visibility Timeout Mechanism Works

The visibility timeout is configured at the queue level and applies to every message received from that queue. The default value is 30 seconds; the maximum is 12 hours. When a consumer calls ReceiveMessage, the clock starts immediately for each message returned in that batch.

If processing takes longer than expected, the consumer can extend the timeout for a specific in-flight message by calling ChangeMessageVisibility before the current timeout expires. This resets the countdown for that individual message without affecting other messages or the queue-level default. The consumer must delete the message explicitly using DeleteMessage after successful processing — SQS never auto-deletes.

Think of visibility timeout like a library checkout window. The book is off the shelf while you have it, but if you don't return or renew it before the window closes, the librarian puts it back on the shelf for the next patron — regardless of whether you finished reading it.
graph LR A[ReceiveMessage] --> B[In-Flight / Hidden] B --> C{Timeout expires?} C -- No --> D[ChangeMessageVisibility] D --> B C -- No --> E[DeleteMessage] E --> F[Message Removed] C -- Yes --> G[Message Visible Again] G --> A
  1. ReceiveMessage — message enters the in-flight state; visibility timeout countdown begins.
  2. Processing window — consumer works on the message; it is invisible to all other pollers.
  3. ChangeMessageVisibility (optional) — extends the timeout for this specific message if more time is needed.
  4. DeleteMessage (success path) — message is permanently removed from the queue.
  5. Timeout expiry (failure path) — message returns to visible state and is requeued for another consumer.

Diagnosing Why Your SQS Messages Are Being Processed Twice

Step 1: Measure Your Actual Processing Time

Before adjusting any timeout, establish a baseline. The visibility timeout must exceed your worst-case processing duration, not your average.

— Why this step: the queue-level timeout is a single fixed value applied to every message; if even one slow message exceeds it, that message will be redelivered, and you cannot diagnose the right threshold without real latency data.

aws sqs get-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
  --attribute-names VisibilityTimeout ApproximateNumberOfMessagesNotVisible

ApproximateNumberOfMessagesNotVisible shows how many messages are currently in-flight. A persistently high value relative to your consumer count is a strong signal that messages are expiring before deletion.

Step 2: Check for Timeout Expiry in CloudWatch

Symptom: ApproximateNumberOfMessagesNotVisible drops sharply at regular intervals matching your visibility timeout, then ApproximateNumberOfMessagesVisible spikes. This pattern confirms messages are returning to the queue rather than being deleted.

— Why this step: application logs show successful processing, which misleads engineers into ruling out the consumer — the requeue event is only visible at the queue metrics layer.

aws cloudwatch get-metric-statistics \
  --namespace AWS/SQS \
  --metric-name ApproximateNumberOfMessagesVisible \
  --dimensions Name=QueueName,Value=my-queue \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-01T01:00:00Z \
  --period 60 \
  --statistics Average

Step 3: Update the Queue Visibility Timeout

Once you have a realistic worst-case processing time, set the visibility timeout to comfortably exceed it. For general workloads, a multiplier of 1.5x to 2x your worst-case duration is a reasonable starting point.

— Why this step: the queue-level setting is the only persistent control; per-message extensions via ChangeMessageVisibility are a runtime safety valve, not a substitute for a correctly sized queue default.

aws sqs set-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
  --attributes VisibilityTimeout=300

Step 4: Extend In-Flight Timeout for Long-Running Messages

For messages whose processing time is variable and unpredictable, implement a heartbeat pattern: a background thread calls ChangeMessageVisibility periodically to extend the timeout before it expires.

— Why this step: a static queue-level timeout cannot accommodate high-variance processing; without active extension, the message will be requeued mid-processing even if the consumer is healthy and making progress.

aws sqs change-message-visibility \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
  --receipt-handle "YOUR_RECEIPT_HANDLE" \
  --visibility-timeout 300

The receipt handle is returned in the ReceiveMessage response and is unique per receive attempt. Store it immediately upon receipt.

The receipt handle expires when the message is deleted or its visibility timeout reaches zero — do not cache it across restarts.

SQS Visibility Timeout with AWS Lambda: The 6x Rule

When Lambda polls an SQS queue as an event source, AWS recommends setting the SQS queue visibility timeout to at least 6 times the Lambda function timeout. This accounts for the time Lambda needs to initialize, retry on throttling, and process the batch before the visibility timeout expires.

If your Lambda function timeout is 30 seconds, configure the SQS queue visibility timeout to at least 180 seconds. If the queue timeout is shorter than this threshold, Lambda may not finish processing the batch before SQS makes the messages visible again, causing the same messages to be delivered in a subsequent invocation.

— In practice, teams often set the Lambda timeout and the SQS visibility timeout to the same value, reasoning that they should match. They do not. The queue timeout must be the larger value by a significant margin to absorb Lambda's internal retry and initialization overhead.

# Set Lambda function timeout to 30 seconds
aws lambda update-function-configuration \
  --function-name my-sqs-processor \
  --timeout 30

# Set SQS visibility timeout to at least 6x Lambda timeout (180 seconds)
aws sqs set-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
  --attributes VisibilityTimeout=180
graph LR A[Lambda timeout 30s] --> B[Queue timeout 180s] B --> C{Queue timeout >= 6x Lambda?} C -- Yes --> D[Safe: no duplicate] C -- No --> E[Timeout expires early] E --> F[Message requeued] F --> G[Duplicate invocation]
  1. Lambda timeout = 30s — the function has 30 seconds to complete execution.
  2. Queue visibility timeout = 180s (6x) — SQS keeps the message hidden long enough for Lambda to initialize, process, and delete the message.
  3. If queue timeout < 6x Lambda timeout — the message becomes visible before Lambda finishes, triggering a duplicate invocation.

Configuring a Dead-Letter Queue to Catch Persistent Failures

Visibility timeout handles transient failures by requeuing messages. For messages that fail repeatedly, a dead-letter queue (DLQ) prevents infinite redelivery loops. Configure a redrive policy with a maxReceiveCount that reflects how many legitimate retries your workload needs before a message is considered unprocessable.

— Why this step: without a DLQ, a poison-pill message will cycle through your queue indefinitely, consuming consumer capacity and masking the real failure in your metrics.

aws sqs set-queue-attributes \
  --queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
  --attributes '{
    "RedrivePolicy": "{\"deadLetterTargetArn\":\"arn:aws:sqs:us-east-1:123456789012:my-dlq\",\"maxReceiveCount\":\"5\"}"
  }'

IAM Permissions Required for Visibility Timeout Operations

Consumers managing visibility timeout programmatically need the following IAM permissions. Note that sqs:ChangeMessageVisibility and sqs:DeleteMessage support resource-level restrictions scoped to the specific queue ARN.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sqs:ReceiveMessage",
        "sqs:DeleteMessage",
        "sqs:ChangeMessageVisibility",
        "sqs:GetQueueAttributes"
      ],
      "Resource": "arn:aws:sqs:us-east-1:123456789012:my-queue"
    }
  ]
}

Understanding SQS Visibility Timeout: Key Takeaways

Duplicate message processing in SQS is almost always a configuration problem, not a platform defect. The visibility timeout must be sized to your worst-case processing duration, not your average. For Lambda consumers, the queue timeout must be at least 6 times the function timeout. For variable-duration workloads, implement a heartbeat using ChangeMessageVisibility to extend in-flight messages dynamically. Pair every queue with a DLQ to isolate messages that exceed your retry budget.

SQS guarantees at-least-once delivery by design — idempotent message processing is the application's responsibility, not the queue's.

Glossary

Visibility Timeout
The duration a message remains hidden from other consumers after being received. Configured at the queue level; you can adjust it for a specific in-flight message using ChangeMessageVisibility before the current timeout expires.
In-Flight Message
A message that has been received by a consumer but not yet deleted. It is invisible to other consumers until the visibility timeout expires or the message is deleted.
ChangeMessageVisibility
An SQS API action that resets the visibility timeout for a specific in-flight message identified by its receipt handle. Used to extend processing time without requeuing.
Dead-Letter Queue (DLQ)
A separate SQS queue that receives messages exceeding the maxReceiveCount threshold defined in a redrive policy. Used to isolate unprocessable messages for inspection.
Receipt Handle
A unique token returned with each ReceiveMessage response. Required for DeleteMessage and ChangeMessageVisibility operations. A new handle is issued each time a message is received.
At-Least-Once Delivery
SQS's delivery guarantee: every message will be delivered at least once, but may be delivered more than once. Applications must handle duplicates through idempotent processing logic.

Related Posts

Comments

Popular posts from this blog

EC2 No Internet Access in Custom VPC: Fix Internet Gateway and Route Table

EC2 SSH Connection Timeout: Which Security Group Rules to Check

Difference Between IAM User and IAM Role: Which One Should Your EC2 Use?