Skip to content

WebFlux with Tomcat, intermittent timeout when creating response  #23096

Closed
@danielra

Description

@danielra

Affects: \5.1.7.RELEASE


I am running a spring-boot application which is using WebFlux with Tomcat as the underlying http server. In my real application with fairly high request rate, I noticed in metrics what appeared to be a slow leak in tomcat connections. I have since added a WebFilter to implement a response timeout, and this confirmed that a low volume of responses were never being completed. With the WebFilter in place, the observed connection leak is fixed - but I would like to resolve the response timeouts.

I have created a relatively simple isolated demo project which reproduces the issue here: https://github.com/danielra/webflux_response_timeout_repro

The project can be built via ./gradlew clean build and then run via java -jar ./build/libs/demo-0.0.1-SNAPSHOT.jar. To reproduce the issue with the demo project I have then been using wrk ( https://github.com/wg/wrk ) to throw load against the running application.

For example:

$ ./wrk -c 20 -t 16 -d 300 http://localhost:8080/get --latency
Running 5m test @ http://localhost:8080/get
  16 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.04ms    5.38ms 335.15ms   98.55%
    Req/Sec     1.62k   280.92     4.75k    79.07%
  Latency Distribution
     50%  508.00us
     75%  762.00us
     90%    2.00ms
     99%    7.48ms
  7719452 requests in 5.00m, 28.98GB read
  Socket errors: connect 0, read 7, write 0, timeout 1
Requests/sec:  25723.26
Transfer/sec:     98.89MB

Note that for this 5 minutes of load, 1 timeout was observed. The corresponding application console output was as follows:

$ java -jar ./build/libs/demo-0.0.1-SNAPSHOT.jar

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.1.5.RELEASE)

2019-06-06 16:41:14.795  INFO 18437 --- [           main] com.example.demo.DemoApplication         : Starting DemoApplication on machine1 with PID 18437 (/home/daniela/webflux_response_timeout_repro/demo/build/libs/demo-0.0.1-SNAPSHOT.jar started by daniela in /home/daniela/webflux_response_timeout_repro/demo)
2019-06-06 16:41:14.798  INFO 18437 --- [           main] com.example.demo.DemoApplication         : No active profile set, falling back to default profiles: default
2019-06-06 16:41:15.642  INFO 18437 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2019-06-06 16:41:15.676  INFO 18437 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2019-06-06 16:41:15.676  INFO 18437 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.19]
2019-06-06 16:41:16.229  INFO 18437 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
2019-06-06 16:41:16.231  INFO 18437 --- [           main] com.example.demo.DemoApplication         : Started DemoApplication in 1.772 seconds (JVM running for 2.188)
Response timeout after 10001 milliseconds for GET request with uri 'http://localhost:8080/get'. Response status code was already committed: '200 OK'.
2019-06-06 16:46:09.230 ERROR 18437 --- [io-8080-exec-12] o.s.w.s.adapter.HttpWebHandlerAdapter    : [67fec09] Error [java.lang.IllegalStateException: Async operation timeout.] for HTTP GET "/get", but ServerHttpResponse already committed (200 OK)
2019-06-06 16:46:09.238 ERROR 18437 --- [io-8080-exec-12] o.a.c.c.C.[.[.[/].[httpHandlerServlet]   : Servlet.service() for servlet [httpHandlerServlet] threw exception

java.lang.IllegalStateException: Async operation timeout.
        at org.springframework.http.server.reactive.ServletServerHttpResponse$ResponseAsyncListener.onTimeout(ServletServerHttpResponse.java:208) ~[spring-web-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]
        at org.apache.catalina.core.AsyncListenerWrapper.fireOnTimeout(AsyncListenerWrapper.java:44) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.catalina.core.AsyncContextImpl.timeout(AsyncContextImpl.java:133) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:153) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:241) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:836) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1747) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]

2019-06-06 16:46:09.239 ERROR 18437 --- [io-8080-exec-12] o.a.c.c.C.[.[.[/].[httpHandlerServlet]   : Servlet.service() for servlet [httpHandlerServlet] in context with path [] threw exception [Failed to create response content] with root cause

java.lang.IllegalStateException: Async operation timeout.
        at org.springframework.http.server.reactive.ServletServerHttpResponse$ResponseAsyncListener.onTimeout(ServletServerHttpResponse.java:208) ~[spring-web-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]
        at org.apache.catalina.core.AsyncListenerWrapper.fireOnTimeout(AsyncListenerWrapper.java:44) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.catalina.core.AsyncContextImpl.timeout(AsyncContextImpl.java:133) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:153) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:241) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:836) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1747) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
        at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]

The line Response timeout after 10001 milliseconds for GET request with uri 'http://localhost:8080/get'. Response status code was already committed: '200 OK'. is from the WebFilter included in the demo project. In this case there was also an apparently corresponding IllegalStateException logged, though in other runs I don't always see this sort of exception logged when the WebFilter's log line appears.

For quick reference, the Controller method which handles the relevant requests in the demo project can be seen here: https://github.com/danielra/webflux_response_timeout_repro/blob/master/src/main/java/com/example/demo/controller/DemoController.java#L30
And the WebFilter that implements the response timeout can be seen here: https://github.com/danielra/webflux_response_timeout_repro/blob/master/src/main/java/com/example/demo/filter/DemoWebFilter.java#L25

Note that while the timeout is set to 10 seconds in this example, the responses seem to never be completed regardless of how long is waited. Apparently leaked tomcat connections could be observed building up for more than a week in my real application's metrics.

Please let me know if there is anything additional from my end that would be helpful in understanding and resolving this issue. Thank you for your time!

Metadata

Metadata

Assignees

Labels

in: webIssues in web modules (web, webmvc, webflux, websocket)type: bugA general bug

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions