Skip to content

MaxReconciliationInterval is not triggered if the resource was never reconciled in the first place #1481

Closed
@jasmdk

Description

@jasmdk

Bug Report

I have an issue where my resource fails to be reconciled as the external service was not available untill after the reconciler exhausted all retries, however I would have expected that the maxReconciliationInterval ensure that no matter what, the reconcile loop would eventually be repeated.

What did you do?

I used the sample-operators/mysql-schema as an example, but removed dependent resources and actual reconcile logic for simplicity.
I then forced the reconcile() method to log a statement and throw a runtime exception.
Furthermore I set the maxReconcilitationInterval to 1 minute:

@ControllerConfiguration(
    maxReconciliationInterval = @MaxReconciliationInterval(
        interval = 1,
        timeUnit = TimeUnit.MINUTES))
public class MySQLSchemaReconciler
    implements Reconciler<MySQLSchema>, ErrorStatusHandler<MySQLSchema> {

  @Override
  public UpdateControl<MySQLSchema> reconcile(MySQLSchema schema, Context<MySQLSchema> context) {
    log.info("Reconciling " + context.getRetryInfo());
    throw new RuntimeException("Something happened during reconcile");
  }

What did you expect to see?

When inspecting the log, while grepping for my log statement, I would expect to see 6 invocations due to the default of 5 retries, but then after 1 minute I would expect the 6 invocations to be repeated.

If I do not throw the runtime exception, but just logs, I see a reconcile loop every minute.

What did you see instead? Under which circumstances?

I just saw the reconcile events due to the initial event and the 5 retries and then never again.
Even if I edit the resource and apply the changes, the reconcile loop is never reset. So even if my "external service" eventually comes up, the failed resource will not be able to reconcile except if I restart my pod.

Environment

Kubernetes cluster type:

K3D

I reproduced using latest code, so pom.xml version is: 4.0.1-SNAPSHOT

$ java -version

$ java -version
openjdk version "14.0.2" 2020-07-14
OpenJDK Runtime Environment Zulu14.29+23-CA (build 14.0.2+12)
OpenJDK 64-Bit Server VM Zulu14.29+23-CA (build 14.0.2+12, mixed mode, sharing)

$ kubectl version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:16:20Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.4+k3s1", GitCommit:"c3f830e9b9ed8a4d9d0e2aa663b4591b923a296e", GitTreeState:"clean", BuildDate:"2022-08-25T03:45:26Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution

I may be missing some context, but it seems like the event processing will clear some state and optionally reschedule the resource on successful execution, but not when an exception occured - this should probably also happen when an exception has occurred AND it was the last retry attempt:

https://github.com/java-operator-sdk/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/EventProcessor.java#L247

Additional context

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions