Description
Use case
Discussed in #2936
We confirmed that SQS FIFO queues can deliver messages from different Group IDs to the same Lambda invocation. Right now, when a record fails processing, we short-circuit an fail the rest of the items on the invocation, regardless if they are from the same message group ID or not.
This un-optimal experience can lead to unintended messages landing on DLQs or failing completely.
We need to explore the idea of continuing to process other group IDs on the same invocation.
Solution/User Experience
When a message fails processing on the SQS FIFO resolver, collect the remaining messages from the same group ID. When a new group ID is found, start processing again.
At the end, return all the failed messages from each message group ID.
Alternative solutions
Depending how the implementation and behaviour turns out, we might consider not making the new behavior the default one, and keep it behind a feature flag.
Acknowledgement
- This feature request meets Powertools for AWS Lambda (Python) Tenets
- Should this be considered in other Powertools for AWS Lambda languages? i.e. Java, TypeScript, and .NET
From the discussion
Originally posted by duc00 August 8, 2023
Hello,
According to Implementing partial batch responses AWS doc:
If you're using this feature with a FIFO queue, your function should stop processing messages after the first failure and return all failed and unprocessed messages in batchItemFailures. This helps preserve the ordering of messages in your queue.
I see that Powertools implementation of the doc, with SqsFifoPartialProcessor
, is strictly following this recommendation. This question is thus more specific to AWS implementation of partial batch responses with FIFO. Posting the question on this repo seems to be a good entry point nonetheless, since both AWS developers and community interact on those subjects.
My problem is the following:
I am processing a SQS FIFO queue with SqsFifoPartialProcessor
. My batch size is 10. Since the queue is not high-scale, the batch often contains messages with different message group IDs. When a failure occurs, the rest of the processing is stopped and all records left are returned as failures to the queue. So I often end-up with valid records in my dead-letter queue just because they were processed in a batch containing an unrelated invalid one.
My question:
Would it be valid, after a failure, to return all remaining records in the batch but only the ones with the same group ID? According to the doc, current implementation is recommended to preserve the ordering of messages in your queue. I am not seeing why processing other records with a different group ID would go against that. Thus my question.
Many thanks!
Metadata
Metadata
Assignees
Type
Projects
Status