Skip to content

AppWrapper with multiple generic items is not changing its state to RunningHoldCompletion when one or more of the generic items completes successfully #387

Open
@metalcycling

Description

@metalcycling

In the following example, 2 jobs are created. Both finish successfully after some time. Two things should happen with the status of the AppWrapper by the end of the execution:

  1. When the first job completes, the status of the AppWrapper should change to RunningHoldCompletion while the second job is running
  2. Once the second job finishes, the AppWrapper status should change to Completed.

The second behavior is happening but not the first behavior. This is due to line ./pkg/controller/queuejob/queuejob_controller_ex.go:706 where the code returns if at least one of the generic items is not completed.

apiVersion: mcad.ibm.com/v1beta1
kind: AppWrapper
metadata:
    name: my-jobs
    namespace: namespace-1
    labels:
        cluster-quota: namespace-1
spec:
    priority: 5
    priorityslope: 0.0
    schedulingSpec:
        minAvailable: 2
        requeuing:
            timeInSeconds: 30
            growthType: "none"
            maxNumRequeuings: 1
    resources:
        Items: []
        GenericItems:
            - replicas: 1
              completionstatus: Complete
              custompodresources:
                  - replicas: 1
                    requests:
                        cpu: 500m
                        nvidia.com/gpu: 0
                        memory: 1Gi
                    limits:
                        cpu: 500m
                        nvidia.com/gpu: 0
                        memory: 1Gi
              generictemplate:
                  apiVersion: batch/v1
                  kind: Job
                  metadata:
                      name: my-job-1
                      namespace: namespace-1
                      labels:
                          appwrapper.mcad.ibm.com: my-jobs
                  spec:
                      parallelism: 1
                      completions: 1
                      template:
                          metadata:
                              name: my-job-1
                              namespace: namespace-1
                              labels:
                                  appwrapper.mcad.ibm.com: my-jobs
                          spec:
                              terminationGracePeriodSeconds: 1
                              restartPolicy: Never
                              containers:
                                  - name: pytorch
                                    image: ubuntu:latest
                                    imagePullPolicy: IfNotPresent
                                    command:
                                        - sh
                                        - -c
                                        - |
                                          sleep 15
                                    resources:
                                        requests:
                                            cpu: 500m
                                            nvidia.com/gpu: 0
                                            memory: 1Gi
                                        limits:
                                            cpu: 500m
                                            nvidia.com/gpu: 0
                                            memory: 1Gi
            - replicas: 1
              completionstatus: Complete
              custompodresources:
                  - replicas: 1
                    requests:
                        cpu: 1000m
                        nvidia.com/gpu: 0
                        memory: 1Gi
                    limits:
                        cpu: 1000m
                        nvidia.com/gpu: 0
                        memory: 1Gi
              generictemplate:
                  apiVersion: batch/v1
                  kind: Job
                  metadata:
                      name: my-job-2
                      namespace: namespace-1
                      labels:
                          appwrapper.mcad.ibm.com: my-jobs
                  spec:
                      parallelism: 1
                      completions: 1
                      template:
                          metadata:
                              name: my-job-2
                              namespace: namespace-1
                              labels:
                                  appwrapper.mcad.ibm.com: my-jobs
                          spec:
                              terminationGracePeriodSeconds: 1
                              restartPolicy: Never
                              containers:
                                  - name: pytorch
                                    image: ubuntu:latest
                                    imagePullPolicy: IfNotPresent
                                    command:
                                        - sh
                                        - -c
                                        - |
                                          sleep 60
                                    resources:
                                        requests:
                                            cpu: 1000m
                                            nvidia.com/gpu: 0
                                            memory: 1Gi
                                        limits:
                                            cpu: 1000m
                                            nvidia.com/gpu: 0
                                            memory: 1Gi

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions