Description
Applications that need to perform graceful shutdown of tasks submitted to ThreadPoolTaskExecutor
and/or ThreadPoolTaskScheduler
, may make use of the setting awaitTerminationMillis
(and possibly waitForTasksToCompleteOnShutdown
).
However, application may take a very long time to actually finish if these conditions apply:
- application uses both
ThreadPoolTaskExecutor
andThreadPoolTaskScheduler
(and possibly multipleThreadPoolTaskExecutor
s). - submitted tasks are not quick.
- there are
SmartLifecycle
beans which implements lenghty stopping.
Real examples are web applications that use such components and make use of the web container "graceful shutdown" feature of Spring Boot. The overall termination sequence of an application is:
- SIGTERM is sent.
SmartLifecycle
asynchronous stopping triggers the web container graceful shutdown.SmartLifecycle
asynchronous stopping blocks and waits for the web container shutdown.- Context closing proceeds, invoking
DisposableBean
/@PreDestroy
methods, so, say:ThreadPoolTaskExecutor
'sdestroy()
is called, blocking and awaiting for the tasks to finish. If the application uses multipleThreadPoolTaskExecutor
s, for each one of them an awaiting occurs.ThreadPoolTaskScheduler
'sdestroy()
is called, blocking and awaiting for the tasks to finish.
The proposal here is to create the possibility to have some form to make all the pools to finish their tasks in parallel. Ideally, in parallel with the stopping of other SmartLifecycle
beans.
In Kubernetes spring web applications that fits in the scenario above, the total time took by a Pod to finish ends up being too high (which is aggravated due to the need to configure a preStop hook to give time to kubeproxy to note the pod deletion). This has real effects: as, rigthly, in a rollout Kubernetes does not wait for old Pods to actually finish in order to create new ones, applications with a large number of pods and with a large termination time, end up having a large number of Pods actually running in the rollout (new Pods and many still being terminated). We have seen this triggering a cluster auto scale to make the cluster able to handle this large number of Pods that occur during the rollout.