Skip to content

Getting "random" ApiCallTimeoutException from SQS polling #1555

Closed
@fbn-roussel

Description

@fbn-roussel

I have an ECS Fargate service running with multiple threads using long polling on multiple queues and I am facing some random error messages related to ApiCallTimeoutException.

Number of polled queues : 6
Long polling duration : 20s
Api call timeout configured : 30s
Number of polling threads by queues : 10

Expected Behavior

I should not get any timeout as my api call timeout is greater than my long polling duration.

Current Behavior

I get random (at least to me) timeouts with following stacktrace :

Unable to poll messages: queueUrl=https://sqs.eu-central-1.amazonaws.com/xxxxxxx/xxxxxx
software.amazon.awssdk.core.exception.ApiCallTimeoutException: Client execution did not complete before the specified timeout configuration: 30000 millis
software.amazon.awssdk.core.exception.ApiCallTimeoutException$BuilderImpl.build(ApiCallTimeoutException.java:87)
software.amazon.awssdk.core.exception.ApiCallTimeoutException.create(ApiCallTimeoutException.java:38)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.generateApiCallTimeoutException(ApiCallTimeoutTrackingStage.java:147)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.handleInterruptedException(ApiCallTimeoutTrackingStage.java:139)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.translatePipelineException(ApiCallTimeoutTrackingStage.java:107)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:62)
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) 
software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:240) 
software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:96) 
software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:120) 
software.amazon.awssdk.core.client.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:73)
software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:44) 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:55)
software.amazon.awssdk.services.sqs.DefaultSqsClient.receiveMessage(DefaultSqsClient.java:1046) 
com.xxx.MessageSubscriber.consume(MessageSubscriber.java:105) 
com.xxx.MessageSubscriber.lambda$null$1(MessageSubscriber.java:74)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

Steps to Reproduce (for bugs)

SQS client configuration

public SqsClient amazonSqsClient() {
    return SqsClient.builder()
      .region(Region.of("eu-central-1"))
      .overrideConfiguration(ClientOverrideConfiguration.builder()
        .apiCallTimeout(Duration.ofMillis(30000))
        .build())
      .build();
  }

SQS poller thread method

  private void consume(final String queueUrl) {
      final ReceiveMessageRequest receiveMessageRequest = ReceiveMessageRequest.builder()
        .queueUrl(queueUrl)
        .maxNumberOfMessages(10)
        .waitTimeSeconds(20)
        .visibilityTimeout(60)
        .build();

    while (true) {
      try {
          log.debug("Polling messages: queueUrl={}", queueUrl);
          final ReceiveMessageResponse receiveMessageResult = sqsClient.receiveMessage(receiveMessageRequest);
          if (receiveMessageResult != null && CollectionUtils.isNotEmpty(receiveMessageResult.messages())) {
            // process message here
          }
          log.debug("Messages polled: queueUrl={}", queueUrl);
        }
      } catch (final Exception e) {
        log.error("Unable to poll messages: queueUrl={}", queueUrl, e);
        TimeUnit.MINUTES.sleep(1);
    }
  }

Context

When exception occurs, it goes to the catch block, sleep for 1 minute and then continue polling.

I am not sure but I think I was not getting these errors when I was using V1 of the SDK.

Your Environment

  • AWS Java SDK version used: 2.10.12
  • JDK version used: 8
  • Operating System and version: ECS Fargate service

Metadata

Metadata

Assignees

No one assigned

    Labels

    closed-for-stalenessguidanceQuestion that needs advice or information.response-requestedWaiting on additional info and feedback. Will move to "closing-soon" in 10 days.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions