Description
Affects: \5.1.7.RELEASE
I am running a spring-boot application which is using WebFlux with Tomcat as the underlying http server. In my real application with fairly high request rate, I noticed in metrics what appeared to be a slow leak in tomcat connections. I have since added a WebFilter to implement a response timeout, and this confirmed that a low volume of responses were never being completed. With the WebFilter in place, the observed connection leak is fixed - but I would like to resolve the response timeouts.
I have created a relatively simple isolated demo project which reproduces the issue here: https://github.com/danielra/webflux_response_timeout_repro
The project can be built via ./gradlew clean build
and then run via java -jar ./build/libs/demo-0.0.1-SNAPSHOT.jar
. To reproduce the issue with the demo project I have then been using wrk
( https://github.com/wg/wrk ) to throw load against the running application.
For example:
$ ./wrk -c 20 -t 16 -d 300 http://localhost:8080/get --latency
Running 5m test @ http://localhost:8080/get
16 threads and 20 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.04ms 5.38ms 335.15ms 98.55%
Req/Sec 1.62k 280.92 4.75k 79.07%
Latency Distribution
50% 508.00us
75% 762.00us
90% 2.00ms
99% 7.48ms
7719452 requests in 5.00m, 28.98GB read
Socket errors: connect 0, read 7, write 0, timeout 1
Requests/sec: 25723.26
Transfer/sec: 98.89MB
Note that for this 5 minutes of load, 1 timeout was observed. The corresponding application console output was as follows:
$ java -jar ./build/libs/demo-0.0.1-SNAPSHOT.jar
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.1.5.RELEASE)
2019-06-06 16:41:14.795 INFO 18437 --- [ main] com.example.demo.DemoApplication : Starting DemoApplication on machine1 with PID 18437 (/home/daniela/webflux_response_timeout_repro/demo/build/libs/demo-0.0.1-SNAPSHOT.jar started by daniela in /home/daniela/webflux_response_timeout_repro/demo)
2019-06-06 16:41:14.798 INFO 18437 --- [ main] com.example.demo.DemoApplication : No active profile set, falling back to default profiles: default
2019-06-06 16:41:15.642 INFO 18437 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2019-06-06 16:41:15.676 INFO 18437 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2019-06-06 16:41:15.676 INFO 18437 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.19]
2019-06-06 16:41:16.229 INFO 18437 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) with context path ''
2019-06-06 16:41:16.231 INFO 18437 --- [ main] com.example.demo.DemoApplication : Started DemoApplication in 1.772 seconds (JVM running for 2.188)
Response timeout after 10001 milliseconds for GET request with uri 'http://localhost:8080/get'. Response status code was already committed: '200 OK'.
2019-06-06 16:46:09.230 ERROR 18437 --- [io-8080-exec-12] o.s.w.s.adapter.HttpWebHandlerAdapter : [67fec09] Error [java.lang.IllegalStateException: Async operation timeout.] for HTTP GET "/get", but ServerHttpResponse already committed (200 OK)
2019-06-06 16:46:09.238 ERROR 18437 --- [io-8080-exec-12] o.a.c.c.C.[.[.[/].[httpHandlerServlet] : Servlet.service() for servlet [httpHandlerServlet] threw exception
java.lang.IllegalStateException: Async operation timeout.
at org.springframework.http.server.reactive.ServletServerHttpResponse$ResponseAsyncListener.onTimeout(ServletServerHttpResponse.java:208) ~[spring-web-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]
at org.apache.catalina.core.AsyncListenerWrapper.fireOnTimeout(AsyncListenerWrapper.java:44) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.catalina.core.AsyncContextImpl.timeout(AsyncContextImpl.java:133) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:153) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:241) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:836) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1747) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
2019-06-06 16:46:09.239 ERROR 18437 --- [io-8080-exec-12] o.a.c.c.C.[.[.[/].[httpHandlerServlet] : Servlet.service() for servlet [httpHandlerServlet] in context with path [] threw exception [Failed to create response content] with root cause
java.lang.IllegalStateException: Async operation timeout.
at org.springframework.http.server.reactive.ServletServerHttpResponse$ResponseAsyncListener.onTimeout(ServletServerHttpResponse.java:208) ~[spring-web-5.1.7.RELEASE.jar!/:5.1.7.RELEASE]
at org.apache.catalina.core.AsyncListenerWrapper.fireOnTimeout(AsyncListenerWrapper.java:44) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.catalina.core.AsyncContextImpl.timeout(AsyncContextImpl.java:133) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.catalina.connector.CoyoteAdapter.asyncDispatch(CoyoteAdapter.java:153) ~[tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.coyote.AbstractProcessor.dispatch(AbstractProcessor.java:241) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:53) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:836) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1747) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-embed-core-9.0.19.jar!/:9.0.19]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181]
The line Response timeout after 10001 milliseconds for GET request with uri 'http://localhost:8080/get'. Response status code was already committed: '200 OK'.
is from the WebFilter included in the demo project. In this case there was also an apparently corresponding IllegalStateException logged, though in other runs I don't always see this sort of exception logged when the WebFilter's log line appears.
For quick reference, the Controller method which handles the relevant requests in the demo project can be seen here: https://github.com/danielra/webflux_response_timeout_repro/blob/master/src/main/java/com/example/demo/controller/DemoController.java#L30
And the WebFilter that implements the response timeout can be seen here: https://github.com/danielra/webflux_response_timeout_repro/blob/master/src/main/java/com/example/demo/filter/DemoWebFilter.java#L25
Note that while the timeout is set to 10 seconds in this example, the responses seem to never be completed regardless of how long is waited. Apparently leaked tomcat connections could be observed building up for more than a week in my real application's metrics.
Please let me know if there is anything additional from my end that would be helpful in understanding and resolving this issue. Thank you for your time!