Skip to content

Parallel graceful shutdown for ThreadPoolTaskExecutor and ThreadPoolTaskScheduler #27090

Closed
@jgslima

Description

@jgslima

Applications that need to perform graceful shutdown of tasks submitted to ThreadPoolTaskExecutor and/or ThreadPoolTaskScheduler, may make use of the setting awaitTerminationMillis (and possibly waitForTasksToCompleteOnShutdown).

However, application may take a very long time to actually finish if these conditions apply:

  • application uses both ThreadPoolTaskExecutor and ThreadPoolTaskScheduler (and possibly multiple ThreadPoolTaskExecutors).
  • submitted tasks are not quick.
  • there are SmartLifecycle beans which implements lenghty stopping.

Real examples are web applications that use such components and make use of the web container "graceful shutdown" feature of Spring Boot. The overall termination sequence of an application is:

  1. SIGTERM is sent.
  2. SmartLifecycle asynchronous stopping triggers the web container graceful shutdown.
  3. SmartLifecycle asynchronous stopping blocks and waits for the web container shutdown.
  4. Context closing proceeds, invoking DisposableBean/@PreDestroy methods, so, say:
    1. ThreadPoolTaskExecutor's destroy() is called, blocking and awaiting for the tasks to finish. If the application uses multiple ThreadPoolTaskExecutors, for each one of them an awaiting occurs.
    2. ThreadPoolTaskScheduler's destroy() is called, blocking and awaiting for the tasks to finish.

The proposal here is to create the possibility to have some form to make all the pools to finish their tasks in parallel. Ideally, in parallel with the stopping of other SmartLifecycle beans.

In Kubernetes spring web applications that fits in the scenario above, the total time took by a Pod to finish ends up being too high (which is aggravated due to the need to configure a preStop hook to give time to kubeproxy to note the pod deletion). This has real effects: as, rigthly, in a rollout Kubernetes does not wait for old Pods to actually finish in order to create new ones, applications with a large number of pods and with a large termination time, end up having a large number of Pods actually running in the rollout (new Pods and many still being terminated). We have seen this triggering a cluster auto scale to make the cluster able to handle this large number of Pods that occur during the rollout.

Metadata

Metadata

Assignees

Labels

in: coreIssues in core modules (aop, beans, core, context, expression)type: enhancementA general enhancement

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions