Skip to content

Track restart count across resource versions #2046

Closed
@gyfora

Description

@gyfora

Bug Report

Currently restart attempts in the controller are tracked/counted per resource and resource version. This seems to be good in most cases however in operator implementations that change the status/spec of the managed resource manually during reconciliation (through fabric8 client) it can cause situations where the controller loops infinitely with practically 0 timeout. (as there is no timeout at the first retry at all, regardless of the retry config).

I believe we should somehow track the attempts accross versions such as by the resource id itself (and expire attempts after some time or successful controller run)

ResourceID{name='my-resource', namespace='flink-test'}, version: 15427577412} failed
ResourceID{name='my-resource'', namespace='flink-test'}, version: 15427577640} failed

What did you expect to see?

Resource strategy applied correctly

What did you see instead? Under which circumstances?

Infinite retry loop with 0 timeout.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions