Closed
Description
I have an ECS Fargate service running with multiple threads using long polling on multiple queues and I am facing some random error messages related to ApiCallTimeoutException.
Number of polled queues : 6
Long polling duration : 20s
Api call timeout configured : 30s
Number of polling threads by queues : 10
Expected Behavior
I should not get any timeout as my api call timeout is greater than my long polling duration.
Current Behavior
I get random (at least to me) timeouts with following stacktrace :
Unable to poll messages: queueUrl=https://sqs.eu-central-1.amazonaws.com/xxxxxxx/xxxxxx
software.amazon.awssdk.core.exception.ApiCallTimeoutException: Client execution did not complete before the specified timeout configuration: 30000 millis
software.amazon.awssdk.core.exception.ApiCallTimeoutException$BuilderImpl.build(ApiCallTimeoutException.java:87)
software.amazon.awssdk.core.exception.ApiCallTimeoutException.create(ApiCallTimeoutException.java:38)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.generateApiCallTimeoutException(ApiCallTimeoutTrackingStage.java:147)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.handleInterruptedException(ApiCallTimeoutTrackingStage.java:139)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.translatePipelineException(ApiCallTimeoutTrackingStage.java:107)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:240)
software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:96)
software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:120)
software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:73)
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:44)
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
software.amazon.awssdk.services.sqs.DefaultSqsClient.receiveMessage(DefaultSqsClient.java:1046)
com.xxx.MessageSubscriber.consume(MessageSubscriber.java:105)
com.xxx.MessageSubscriber.lambda$null$1(MessageSubscriber.java:74)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Steps to Reproduce (for bugs)
SQS client configuration
public SqsClient amazonSqsClient() {
return SqsClient.builder()
.region(Region.of("eu-central-1"))
.overrideConfiguration(ClientOverrideConfiguration.builder()
.apiCallTimeout(Duration.ofMillis(30000))
.build())
.build();
}
SQS poller thread method
private void consume(final String queueUrl) {
final ReceiveMessageRequest receiveMessageRequest = ReceiveMessageRequest.builder()
.queueUrl(queueUrl)
.maxNumberOfMessages(10)
.waitTimeSeconds(20)
.visibilityTimeout(60)
.build();
while (true) {
try {
log.debug("Polling messages: queueUrl={}", queueUrl);
final ReceiveMessageResponse receiveMessageResult = sqsClient.receiveMessage(receiveMessageRequest);
if (receiveMessageResult != null && CollectionUtils.isNotEmpty(receiveMessageResult.messages())) {
// process message here
}
log.debug("Messages polled: queueUrl={}", queueUrl);
}
} catch (final Exception e) {
log.error("Unable to poll messages: queueUrl={}", queueUrl, e);
TimeUnit.MINUTES.sleep(1);
}
}
Context
When exception occurs, it goes to the catch block, sleep for 1 minute and then continue polling.
I am not sure but I think I was not getting these errors when I was using V1 of the SDK.
Your Environment
- AWS Java SDK version used: 2.10.12
- JDK version used: 8
- Operating System and version: ECS Fargate service