Skip to content

Incorrect handling of errors in the worker thread #481

Closed
@z103cb

Description

@z103cb

While debugging issue #479, the log presented by the user showed indications that the worker thread is requeueing the AppWrappers that was deleted in a previous iteration.

Steps to reproduce the issue:

  1. Delete an AppWrappers while is handled by the ScheduleNext function.

Relevant log lines:

E0716 07:09:51.528961       1 queuejob_controller_ex.go:2069] [manageQueueJob] Error updating status 'setQueueing' AppWrapper: 'default/stateful-set-2-replicas',Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Queueing ControllerFirstTimestamp:2023-07-16 07:09:51.526811452 +0000 UTC m=+268.676976380 ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before manageQueueJob - setQueueing Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526811934 +0000 UTC m=+268.676976863 LastTransitionMicroTime:2023-07-16 07:09:51.526812006 +0000 UTC m=+268.676976927 Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526865651 +0000 UTC m=+268.677030584 LastTransitionMicroTime:2023-07-16 07:09:51.52686572 +0000 UTC m=+268.677030647 Reason:AwaitingHeadOfLine Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0}, err=appwrappers.mcad.ibm.com "stateful-set-2-replicas" not found.
E0716 07:09:51.529001       1 queuejob_controller_ex.go:1906] [worker] Failed to sync AppWrapper 'default/stateful-set-2-replicas', err &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"appwrappers.mcad.ibm.com \"stateful-set-2-replicas\" not found", Reason:"NotFound", Details:(*v1.StatusDetails)(0xc000e0a4e0), Code:404}}
W0716 07:09:51.529022       1 queuejob_controller_ex.go:1915] [worker] Fail to process item from eventQueue, err appwrappers.mcad.ibm.com "stateful-set-2-replicas" not found. Attempting to re-enqueque...
W0716 07:09:51.529048       1 queuejob_controller_ex.go:1919] [worker] Item re-enqueued
I071

Schedule next

I0716 07:09:51.529161       1 queuejob_controller_ex.go:1126] [ScheduleNext] activeQ.Pop stateful-set-2-replicas *Delay=0.002346 seconds RemainingLength=0 &qj=0xc000710000 Version=120280 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:Queueing ControllerFirstTimestamp:2023-07-16 07:09:51.526811452 +0000 UTC m=+268.676976380 ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before manageQueueJob - setQueueing Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526811934 +0000 UTC m=+268.676976863 LastTransitionMicroTime:2023-07-16 07:09:51.526812006 +0000 UTC m=+268.676976927 Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526865651 +0000 UTC m=+268.677030584 LastTransitionMicroTime:2023-07-16 07:09:51.52686572 +0000 UTC m=+268.677030647 Reason:AwaitingHeadOfLine Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0}
I0716 07:09:51.529286       1 queuejob_controller_ex.go:1500] [updateStatusInEtcd] trying to update 'default/stateful-set-2-replicas' called by 'ScheduleNext - setHOL'
E0716 07:09:51.530265       1 queuejob_controller_ex.go:1210] [ScheduleNext] Failed to updated status in etcd for app wrapper 'default/stateful-set-2-replicas', status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-07-16 07:09:51.526811452 +0000 UTC m=+268.676976380 ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526811934 +0000 UTC m=+268.676976863 LastTransitionMicroTime:2023-07-16 07:09:51.526812006 +0000 UTC m=+268.676976927 Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526865651 +0000 UTC m=+268.677030584 LastTransitionMicroTime:2023-07-16 07:09:51.52686572 +0000 UTC m=+268.677030647 Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-07-16 07:09:51.529266315 +0000 UTC m=+268.679431241 LastTransitionMicroTime:2023-07-16 07:09:51.529266375 +0000 UTC m=+268.679431302 Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0}, err=appwrappers.mcad.ibm.com "stateful-set-2-replicas" not found
W0716 07:09:51.530309       1 queuejob_controller_ex.go:1470] [ScheduleNext] app wrapper 'default/stateful-set-2-replicas' not found skiping dispatch
I0716 07:09:53.531128       1 queuejob_controller_ex.go:1126] [ScheduleNext] activeQ.Pop stateful-set-2-replicas *Delay=2.004305 seconds RemainingLength=0 &qj=0xc000710000 Version=120280 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-07-16 07:09:51.526811452 +0000 UTC m=+268.676976380 ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526811934 +0000 UTC m=+268.676976863 LastTransitionMicroTime:2023-07-16 07:09:51.526812006 +0000 UTC m=+268.676976927 Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526865651 +0000 UTC m=+268.677030584 LastTransitionMicroTime:2023-07-16 07:09:51.52686572 +0000 UTC m=+268.677030647 Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-07-16 07:09:51.529266315 +0000 UTC m=+268.679431241 LastTransitionMicroTime:2023-07-16 07:09:51.529266375 +0000 UTC m=+268.679431302 Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0}
I0716 07:09:53.531225       1 queuejob_controller_ex.go:1500] [updateStatusInEtcd] trying to update 'default/stateful-set-2-replicas' called by 'ScheduleNext - setHOL'
E0716 07:09:53.532525       1 queuejob_controller_ex.go:1210] [ScheduleNext] Failed to updated status in etcd for app wrapper 'default/stateful-set-2-replicas', status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-07-16 07:09:51.526811452 +0000 UTC m=+268.676976380 ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526811934 +0000 UTC m=+268.676976863 LastTransitionMicroTime:2023-07-16 07:09:51.526812006 +0000 UTC m=+268.676976927 Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526865651 +0000 UTC m=+268.677030584 LastTransitionMicroTime:2023-07-16 07:09:51.52686572 +0000 UTC m=+268.677030647 Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-07-16 07:09:51.529266315 +0000 UTC m=+268.679431241 LastTransitionMicroTime:2023-07-16 07:09:51.529266375 +0000 UTC m=+268.679431302 Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0}, err=appwrappers.mcad.ibm.com "stateful-set-2-replicas" not found
W0716 07:09:53.532586       1 queuejob_controller_ex.go:1470] [ScheduleNext] app wrapper 'default/stateful-set-2-replicas' not found skiping dispatch
I0716 07:09:58.037446       1 queuejob_controller_ex.go:1126] [ScheduleNext] activeQ.Pop stateful-set-2-replicas *Delay=6.510625 seconds RemainingLength=0 &qj=0xc000710000 Version=120280 Status={Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-07-16 07:09:51.526811452 +0000 UTC m=+268.676976380 ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526811934 +0000 UTC m=+268.676976863 LastTransitionMicroTime:2023-07-16 07:09:51.526812006 +0000 UTC m=+268.676976927 Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526865651 +0000 UTC m=+268.677030584 LastTransitionMicroTime:2023-07-16 07:09:51.52686572 +0000 UTC m=+268.677030647 Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-07-16 07:09:51.529266315 +0000 UTC m=+268.679431241 LastTransitionMicroTime:2023-07-16 07:09:51.529266375 +0000 UTC m=+268.679431302 Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0}
I0716 07:09:58.037554       1 queuejob_controller_ex.go:1500] [updateStatusInEtcd] trying to update 'default/stateful-set-2-replicas' called by 'ScheduleNext - setHOL'
E0716 07:09:58.039005       1 queuejob_controller_ex.go:1210] [ScheduleNext] Failed to updated status in etcd for app wrapper 'default/stateful-set-2-replicas', status = {Pending:0 Running:0 Succeeded:0 Failed:0 MinAvailable:0 CanRun:false IsDispatched:false State:Pending Message: SystemPriority:0 QueueJobState:HeadOfLine ControllerFirstTimestamp:2023-07-16 07:09:51.526811452 +0000 UTC m=+268.676976380 ControllerFirstDispatchTimestamp:0001-01-01 00:00:00 +0000 UTC FilterIgnore:true Sender:before ScheduleNext - setHOL Local:false Conditions:[{Type:Init Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526811934 +0000 UTC m=+268.676976863 LastTransitionMicroTime:2023-07-16 07:09:51.526812006 +0000 UTC m=+268.676976927 Reason: Message:} {Type:Queueing Status:True LastUpdateMicroTime:2023-07-16 07:09:51.526865651 +0000 UTC m=+268.677030584 LastTransitionMicroTime:2023-07-16 07:09:51.52686572 +0000 UTC m=+268.677030647 Reason:AwaitingHeadOfLine Message:} {Type:HeadOfLine Status:True LastUpdateMicroTime:2023-07-16 07:09:51.529266315 +0000 UTC m=+268.679431241 LastTransitionMicroTime:2023-07-16 07:09:51.529266375 +0000 UTC m=+268.679431302 Reason:FrontOfQueue. Message:}] PendingPodConditions:[] TotalCPU:0 TotalMemory:0 TotalGPU:0 RequeueingTimeInSeconds:0 NumberOfRequeueings:0}, err=appwrappers.mcad.ibm.com "stateful-set-2-replicas" not found

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions