Closed
Description
Bug Report
Currently restart attempts in the controller are tracked/counted per resource and resource version. This seems to be good in most cases however in operator implementations that change the status/spec of the managed resource manually during reconciliation (through fabric8 client) it can cause situations where the controller loops infinitely with practically 0 timeout. (as there is no timeout at the first retry at all, regardless of the retry config).
I believe we should somehow track the attempts accross versions such as by the resource id itself (and expire attempts after some time or successful controller run)
ResourceID{name='my-resource', namespace='flink-test'}, version: 15427577412} failed
ResourceID{name='my-resource'', namespace='flink-test'}, version: 15427577640} failed
What did you expect to see?
Resource strategy applied correctly
What did you see instead? Under which circumstances?
Infinite retry loop with 0 timeout.
Metadata
Metadata
Assignees
Labels
No labels