Open
Description
In the following example, 2 jobs are created. Both finish successfully after some time. Two things should happen with the status of the AppWrapper by the end of the execution:
- When the first job completes, the status of the AppWrapper should change to
RunningHoldCompletion
while the second job is running - Once the second job finishes, the AppWrapper status should change to
Completed
.
The second behavior is happening but not the first behavior. This is due to line ./pkg/controller/queuejob/queuejob_controller_ex.go:706
where the code returns if at least one of the generic items is not completed.
apiVersion: mcad.ibm.com/v1beta1
kind: AppWrapper
metadata:
name: my-jobs
namespace: namespace-1
labels:
cluster-quota: namespace-1
spec:
priority: 5
priorityslope: 0.0
schedulingSpec:
minAvailable: 2
requeuing:
timeInSeconds: 30
growthType: "none"
maxNumRequeuings: 1
resources:
Items: []
GenericItems:
- replicas: 1
completionstatus: Complete
custompodresources:
- replicas: 1
requests:
cpu: 500m
nvidia.com/gpu: 0
memory: 1Gi
limits:
cpu: 500m
nvidia.com/gpu: 0
memory: 1Gi
generictemplate:
apiVersion: batch/v1
kind: Job
metadata:
name: my-job-1
namespace: namespace-1
labels:
appwrapper.mcad.ibm.com: my-jobs
spec:
parallelism: 1
completions: 1
template:
metadata:
name: my-job-1
namespace: namespace-1
labels:
appwrapper.mcad.ibm.com: my-jobs
spec:
terminationGracePeriodSeconds: 1
restartPolicy: Never
containers:
- name: pytorch
image: ubuntu:latest
imagePullPolicy: IfNotPresent
command:
- sh
- -c
- |
sleep 15
resources:
requests:
cpu: 500m
nvidia.com/gpu: 0
memory: 1Gi
limits:
cpu: 500m
nvidia.com/gpu: 0
memory: 1Gi
- replicas: 1
completionstatus: Complete
custompodresources:
- replicas: 1
requests:
cpu: 1000m
nvidia.com/gpu: 0
memory: 1Gi
limits:
cpu: 1000m
nvidia.com/gpu: 0
memory: 1Gi
generictemplate:
apiVersion: batch/v1
kind: Job
metadata:
name: my-job-2
namespace: namespace-1
labels:
appwrapper.mcad.ibm.com: my-jobs
spec:
parallelism: 1
completions: 1
template:
metadata:
name: my-job-2
namespace: namespace-1
labels:
appwrapper.mcad.ibm.com: my-jobs
spec:
terminationGracePeriodSeconds: 1
restartPolicy: Never
containers:
- name: pytorch
image: ubuntu:latest
imagePullPolicy: IfNotPresent
command:
- sh
- -c
- |
sleep 60
resources:
requests:
cpu: 1000m
nvidia.com/gpu: 0
memory: 1Gi
limits:
cpu: 1000m
nvidia.com/gpu: 0
memory: 1Gi
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
No status