Description
Bug description
In remote partitioning jobs which use the MessageChannelPartitionHandler
with database polling, on each poll of the database where there is 1 or more new completed worker StepExecutions
, the partition handler loads and keeps in memory an additional copy of the corresponding JobInstance
, JobExecution
, and all StepExecutions
and their ExecutionContexts
until the partition step is completed.
This leads to high memory consumption during the partition step and can lead to out of memory errors if the poll interval is short enough and the number of partitions is high enough, especially since the ExecutionContexts
are held in memory as well.
Environment
Any environment using spring-batch-integration
5.0.1
and above (93800c6) which also uses the MessageChannelPartitionHandler
with database polling.
Steps to reproduce
Run a remote partitioning batch job with database polling, a short poll interval, high number of partitions, and limited available memory.
Expected behavior
Polling of the database in remote partitioning jobs does not lead to constant gradual increase of consumed memory until the partition step completes.
Minimal Complete Reproducible example
Minimal Complete Reproducible example is here: spring-batch-mcve-memory-leak.zip
The example runs a remote partitioning batch job with 1000 partitions, each having an ExecutionContext
containing a single UUID
. In order to exacerbate the memory consumption, the poll interval is set to very low (2ms
), and the worker step sleeps for 50ms
before completing. This allows a new copy of the JobInstance
, JobExecution
, StepExecutions
, and ExecutionContexts
to be loaded and held in memory each time a worker step completes.
Please run the example with the -Xmx64m -XX:+HeapDumpOnOutOfMemoryError
jvm options:
MAVEN_OPTS='-Xmx64m -XX:+HeapDumpOnOutOfMemoryError' mvn package exec:java -Dexec.mainClass=org.springframework.batch.MyBatchJobConfiguration
This should cause an OutOfMemoryError
to be thrown rather quickly and the resulting heap dump should contain the following:
- 60-65 instances of
JobInstance
,JobParameters
,JobExecition
- 60k-65k instances of
StepExecution
,ExecutionContext
Analysis
In MessageChannelPartitionHandler#pollReplies
, the callback calls JobExplorer#getJobExecution
. The SimpleJobExplorer
implementation loads the JobExecution
and all of the StepExecutions
as well as their ExecutionContexts
. Each of these StepExecutions
also contains a reference to the JobExecution
and thus to all other StepExecutions
indirectly. If any of the loaded StepExecutions
is completed and not present in the result
Set, they are added to it, and this causes the currently loaded JobExecution
instance and all of the other StepExecution
instances to be held in memory.