Description
Rapid preemptions were observed for a high-priority AW until it reached a steady state. consider the below leaf node in quota tree :
- name: namespace-2
quotas:
hardLimit: false
requests:
cpu: 3000
memory: 6Gi
the root had cpu of 24000 and memory 32Gi. 3 AWs were submitted with a priority of 100 and resource requirements as CPU: 2000 and memory 24Gi, CPU: 2000 and memory 4Gi, CPU: 2000 and memory 4Gi, later a high priority AW with priority 1000 consuming the same quota node was submitted with CPU as 22000m and memory as 22 Gi, this cause AWs using quota namespace-2 to be deleted often for quite some time. below is some history of pods getting deleted:
base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods
NAME READY STATUS RESTARTS AGE
batch-job-11-j4ldg 1/1 Running 0 47m
batch-job-12-bngz9 0/1 Pending 0 27s
batch-job-2-q9hnr 1/1 Terminating 0 48s
batch-job-3-ch5s6 1/1 Running 0 14m
batch-job-4-nqs8k 1/1 Terminating 0 44s
(base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods
NAME READY STATUS RESTARTS AGE
batch-job-11-j4ldg 1/1 Running 0 47m
batch-job-12-bngz9 0/1 Pending 0 28s
batch-job-2-q9hnr 1/1 Terminating 0 49s
batch-job-3-ch5s6 1/1 Running 0 14m
(base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods
NAME READY STATUS RESTARTS AGE
batch-job-11-j4ldg 1/1 Running 0 47m
batch-job-12-bngz9 0/1 Pending 0 29s
batch-job-3-ch5s6 1/1 Running 0 14m
(base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods
NAME READY STATUS RESTARTS AGE
batch-job-11-j4ldg 1/1 Running 0 47m
batch-job-12-bngz9 0/1 Pending 0 32s
batch-job-3-ch5s6 1/1 Running 0 14m
(base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods
NAME READY STATUS RESTARTS AGE
batch-job-11-j4ldg 1/1 Running 0 47m
batch-job-12-bngz9 0/1 Pending 0 33s
batch-job-3-ch5s6 1/1 Running 0 14m
(base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods
NAME READY STATUS RESTARTS AGE
batch-job-11-j4ldg 1/1 Running 0 47m
batch-job-3-ch5s6 1/1 Running 0 14m
(base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
batch-job-11-j4ldg 1/1 Running 0 47m 10.128.21.56 ip-10-0-148-128.us-east-2.compute.internal <none> <none>
batch-job-3-ch5s6 1/1 Running 0 14m 10.128.21.65 ip-10-0-148-128.us-east-2.compute.internal <none> <none>
batch-job-4-wl9px 0/1 ContainerCreating 0 2s <none> ip-10-0-148-128.us-east-2.compute.internal <none> <none>
(base) abhishekmalvankar@Abhisheks-MacBook-
(base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
batch-job-11-j4ldg 1/1 Running 0 12h 10.128.21.56 ip-10-0-148-128.us-east-2.compute.internal <none> <none>
batch-job-2-zlpww 1/1 Running 0 9h 10.128.20.104 ip-10-0-148-128.us-east-2.compute.internal <none> <none>
batch-job-3-ch5s6 1/1 Running 0 11h 10.128.21.65 ip-10-0-148-128.us-east-2.compute.internal <none> <none>
batch-job-5-nzs5h 1/1 Running 0 9h 10.128.20.103 ip-10-0-148-128.us-east-2.compute.internal <none> <none>
batch job batch-job-11
is never preempted since it is using quota from another leaf node that has hardLimit: true
The current behavior is unclear as to when the preemption cycle stops, and the criteria for stopping the preemption, with experiment, kept running log enough we see preemption stopped and high priority AW never dispatched:
(base) abhishekmalvankar@Abhisheks-MacBook-Pro multi-cluster-app-dispatcher % oc get pods
NAME READY STATUS RESTARTS AGE
batch-job-11-j4ldg 1/1 Running 0 12h
batch-job-2-zlpww 1/1 Running 0 10h
batch-job-3-ch5s6 1/1 Running 0 12h
batch-job-5-nzs5h 1/1 Running 0 10h
Status:
Conditions:
Last Transition Micro Time: 2023-05-10T21:59:28.781563Z
Last Update Micro Time: 2023-05-10T21:59:28.781563Z
Status: True
Type: Init
Last Transition Micro Time: 2023-05-10T21:59:28.781678Z
Last Update Micro Time: 2023-05-10T21:59:28.781678Z
Reason: AwaitingHeadOfLine
Status: True
Type: Queueing
Last Transition Micro Time: 2023-05-11T00:12:19.685738Z
Last Update Micro Time: 2023-05-11T00:12:19.685738Z
Reason: AppWrapperRunnable
Status: True
Type: Dispatched
Last Transition Micro Time: 2023-05-11T00:12:56.241814Z
Last Update Micro Time: 2023-05-11T00:12:56.241813Z
Message: Pods failed scheduling failed=1, running=0.
Reason: PodsFailedScheduling
Status: True
Type: PreemptCandidate
Last Transition Micro Time: 2023-05-10T21:59:55.190754Z
Last Update Micro Time: 2023-05-10T21:59:55.190754Z
Message: Pods failed scheduling failed=1, running=0.
Reason: PreemptionTriggered
Status: True
Type: Backoff
Last Transition Micro Time: 2023-05-10T22:00:15.192484Z
Last Update Micro Time: 2023-05-10T22:00:15.192484Z
Reason: FrontOfQueue.
Status: True
Type: HeadOfLine
Last Transition Micro Time: 2023-05-11T00:13:16.262124Z
Last Update Micro Time: 2023-05-11T00:13:16.262124Z
Message: Insufficient quota to dispatch AppWrapper.
Reason: AppWrapperNotRunnable. Failed to allocate quota on quota designation 'quota_context'
Status: True
Type: Backoff
Controllerfirsttimestamp: 2023-05-10T21:59:28.781432Z
Filterignore: true
Pendingpodconditions:
Conditions:
Message: 0/2 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 Insufficient cpu. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling.
Reason: Unschedulable
Status: False
Type: PodScheduled
Podname: batch-job-12-dxvqd
Queuejobstate: HeadOfLine
Sender: before ScheduleNext - setHOL
State: Pending
Systempriority: 1000
Events: <none>
Metadata
Metadata
Assignees
Type
Projects
Status