Understanding SQS Visibility Timeout: Why Your Messages Are Being Processed Twice
SQS visibility timeout controls how long a message stays hidden from other consumers after being received — misconfiguring it is the most common cause of duplicate message processing in production queues.
TL;DR: SQS Visibility Timeout at a Glance
| Scenario | What Happens | Root Cause | Fix |
|---|---|---|---|
| Message processed twice | Two consumers handle the same message | Visibility timeout shorter than processing time | Increase timeout or call ChangeMessageVisibility |
| Lambda duplicate invocations | Function retried before completion | Queue timeout less than 6x Lambda timeout | Set queue timeout to at least 6x Lambda timeout |
| Message stuck in flight | Never requeued after consumer crash | Timeout too long, no delete on failure | Tune timeout + configure DLQ |
| Message never retried | Failed processing silently dropped | Consumer deletes before confirming success | Delete only after successful processing |
What Is SQS Visibility Timeout and Why It Causes Duplicate Processing
When a consumer calls ReceiveMessage, SQS does not remove the message from the queue. Instead, it hides the message from all other consumers for a configurable duration — the visibility timeout. If the consumer deletes the message before the timeout expires, the message is gone. If it does not — whether due to a crash, slow processing, or a bug — the timeout expires and the message becomes visible again, available for any consumer to pick up.
Duplicate processing is not a bug in SQS. It is the designed behavior of an at-least-once delivery system when the visibility timeout is shorter than actual processing time.
- Consumer A receives the message — SQS hides it for the configured visibility timeout duration.
- Timeout expires before Consumer A finishes — SQS makes the message visible again.
- Consumer B receives the same message — both consumers now process it concurrently.
- Consumer A eventually deletes the message — but Consumer B has already started work, causing a duplicate.
How the Visibility Timeout Mechanism Works
The visibility timeout is configured at the queue level and applies to every message received from that queue. The default value is 30 seconds; the maximum is 12 hours. When a consumer calls ReceiveMessage, the clock starts immediately for each message returned in that batch.
If processing takes longer than expected, the consumer can extend the timeout for a specific in-flight message by calling ChangeMessageVisibility before the current timeout expires. This resets the countdown for that individual message without affecting other messages or the queue-level default. The consumer must delete the message explicitly using DeleteMessage after successful processing — SQS never auto-deletes.
Think of visibility timeout like a library checkout window. The book is off the shelf while you have it, but if you don't return or renew it before the window closes, the librarian puts it back on the shelf for the next patron — regardless of whether you finished reading it.
- ReceiveMessage — message enters the in-flight state; visibility timeout countdown begins.
- Processing window — consumer works on the message; it is invisible to all other pollers.
- ChangeMessageVisibility (optional) — extends the timeout for this specific message if more time is needed.
- DeleteMessage (success path) — message is permanently removed from the queue.
- Timeout expiry (failure path) — message returns to visible state and is requeued for another consumer.
Diagnosing Why Your SQS Messages Are Being Processed Twice
Step 1: Measure Your Actual Processing Time
Before adjusting any timeout, establish a baseline. The visibility timeout must exceed your worst-case processing duration, not your average.
— Why this step: the queue-level timeout is a single fixed value applied to every message; if even one slow message exceeds it, that message will be redelivered, and you cannot diagnose the right threshold without real latency data.
aws sqs get-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
--attribute-names VisibilityTimeout ApproximateNumberOfMessagesNotVisible
ApproximateNumberOfMessagesNotVisible shows how many messages are currently in-flight. A persistently high value relative to your consumer count is a strong signal that messages are expiring before deletion.
Step 2: Check for Timeout Expiry in CloudWatch
Symptom: ApproximateNumberOfMessagesNotVisible drops sharply at regular intervals matching your visibility timeout, then ApproximateNumberOfMessagesVisible spikes. This pattern confirms messages are returning to the queue rather than being deleted.
— Why this step: application logs show successful processing, which misleads engineers into ruling out the consumer — the requeue event is only visible at the queue metrics layer.
aws cloudwatch get-metric-statistics \
--namespace AWS/SQS \
--metric-name ApproximateNumberOfMessagesVisible \
--dimensions Name=QueueName,Value=my-queue \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T01:00:00Z \
--period 60 \
--statistics Average
Step 3: Update the Queue Visibility Timeout
Once you have a realistic worst-case processing time, set the visibility timeout to comfortably exceed it. For general workloads, a multiplier of 1.5x to 2x your worst-case duration is a reasonable starting point.
— Why this step: the queue-level setting is the only persistent control; per-message extensions via ChangeMessageVisibility are a runtime safety valve, not a substitute for a correctly sized queue default.
aws sqs set-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
--attributes VisibilityTimeout=300
Step 4: Extend In-Flight Timeout for Long-Running Messages
For messages whose processing time is variable and unpredictable, implement a heartbeat pattern: a background thread calls ChangeMessageVisibility periodically to extend the timeout before it expires.
— Why this step: a static queue-level timeout cannot accommodate high-variance processing; without active extension, the message will be requeued mid-processing even if the consumer is healthy and making progress.
aws sqs change-message-visibility \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
--receipt-handle "YOUR_RECEIPT_HANDLE" \
--visibility-timeout 300
The receipt handle is returned in the ReceiveMessage response and is unique per receive attempt. Store it immediately upon receipt.
SQS Visibility Timeout with AWS Lambda: The 6x Rule
When Lambda polls an SQS queue as an event source, AWS recommends setting the SQS queue visibility timeout to at least 6 times the Lambda function timeout. This accounts for the time Lambda needs to initialize, retry on throttling, and process the batch before the visibility timeout expires.
If your Lambda function timeout is 30 seconds, configure the SQS queue visibility timeout to at least 180 seconds. If the queue timeout is shorter than this threshold, Lambda may not finish processing the batch before SQS makes the messages visible again, causing the same messages to be delivered in a subsequent invocation.
— In practice, teams often set the Lambda timeout and the SQS visibility timeout to the same value, reasoning that they should match. They do not. The queue timeout must be the larger value by a significant margin to absorb Lambda's internal retry and initialization overhead.
# Set Lambda function timeout to 30 seconds
aws lambda update-function-configuration \
--function-name my-sqs-processor \
--timeout 30
# Set SQS visibility timeout to at least 6x Lambda timeout (180 seconds)
aws sqs set-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
--attributes VisibilityTimeout=180
- Lambda timeout = 30s — the function has 30 seconds to complete execution.
- Queue visibility timeout = 180s (6x) — SQS keeps the message hidden long enough for Lambda to initialize, process, and delete the message.
- If queue timeout < 6x Lambda timeout — the message becomes visible before Lambda finishes, triggering a duplicate invocation.
Configuring a Dead-Letter Queue to Catch Persistent Failures
Visibility timeout handles transient failures by requeuing messages. For messages that fail repeatedly, a dead-letter queue (DLQ) prevents infinite redelivery loops. Configure a redrive policy with a maxReceiveCount that reflects how many legitimate retries your workload needs before a message is considered unprocessable.
— Why this step: without a DLQ, a poison-pill message will cycle through your queue indefinitely, consuming consumer capacity and masking the real failure in your metrics.
aws sqs set-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/my-queue \
--attributes '{
"RedrivePolicy": "{\"deadLetterTargetArn\":\"arn:aws:sqs:us-east-1:123456789012:my-dlq\",\"maxReceiveCount\":\"5\"}"
}'
IAM Permissions Required for Visibility Timeout Operations
Consumers managing visibility timeout programmatically need the following IAM permissions. Note that sqs:ChangeMessageVisibility and sqs:DeleteMessage support resource-level restrictions scoped to the specific queue ARN.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
"sqs:ChangeMessageVisibility",
"sqs:GetQueueAttributes"
],
"Resource": "arn:aws:sqs:us-east-1:123456789012:my-queue"
}
]
}
Understanding SQS Visibility Timeout: Key Takeaways
Duplicate message processing in SQS is almost always a configuration problem, not a platform defect. The visibility timeout must be sized to your worst-case processing duration, not your average. For Lambda consumers, the queue timeout must be at least 6 times the function timeout. For variable-duration workloads, implement a heartbeat using ChangeMessageVisibility to extend in-flight messages dynamically. Pair every queue with a DLQ to isolate messages that exceed your retry budget.
SQS guarantees at-least-once delivery by design — idempotent message processing is the application's responsibility, not the queue's.
Glossary
- Visibility Timeout
- The duration a message remains hidden from other consumers after being received. Configured at the queue level; you can adjust it for a specific in-flight message using
ChangeMessageVisibilitybefore the current timeout expires. - In-Flight Message
- A message that has been received by a consumer but not yet deleted. It is invisible to other consumers until the visibility timeout expires or the message is deleted.
- ChangeMessageVisibility
- An SQS API action that resets the visibility timeout for a specific in-flight message identified by its receipt handle. Used to extend processing time without requeuing.
- Dead-Letter Queue (DLQ)
- A separate SQS queue that receives messages exceeding the
maxReceiveCountthreshold defined in a redrive policy. Used to isolate unprocessable messages for inspection. - Receipt Handle
- A unique token returned with each
ReceiveMessageresponse. Required forDeleteMessageandChangeMessageVisibilityoperations. A new handle is issued each time a message is received. - At-Least-Once Delivery
- SQS's delivery guarantee: every message will be delivered at least once, but may be delivered more than once. Applications must handle duplicates through idempotent processing logic.
Comments
Post a Comment