Skip to content

Ready postcondition doesn't receive status updates for managed dependent resources in managed workflow #1565

Closed
@grossws

Description

@grossws

Bug Report

Reconciler with managed workflow and managed dependent resources with ready post-condition doesn't progress after reconciling first dependent resource.

Resource with readyPostcondition is reconciled successfully but condition based on secondary is never met since it receives the same secondary resource cached at time of its reconcilation.

Tried it with both WATCH_ALL_NAMESPACES and WATCH_CURRENT_NAMESPACE.

What did you do?

Full reproducer: https://github.com/grossws/operatorsdk-es-issue

@ControllerConfiguration(
        name = "project-operator",
        dependents = {
                @Dependent(name = "first-svc", type = FirstService.class),
                @Dependent(name = "second-svc", type = SecondService.class),
                @Dependent(name = "first", type = FirstStatefulSet.class,
                        dependsOn = {"first-svc"},
                        readyPostcondition = MyReconciler.FirstReadyCondition.class),
                @Dependent(name = "second", type = SecondStatefulSet.class,
                        dependsOn = {"second-svc", "first"}),
        }
)
public class MyReconciler implements Reconciler<Project>, ContextInitializer<Project> {
    static final Logger log = LoggerFactory.getLogger(MyReconciler.class);

    @Inject 
    KubernetesClient client;

    @Override
    public void initContext(Project primary, Context<Project> context) {
        context.managedDependentResourceContext().put("client", client);
    }

    @Override
    public UpdateControl<Project> reconcile(Project resource, Context<Project> context) throws Exception {
        var ready = context.managedDependentResourceContext().getWorkflowReconcileResult().orElseThrow().allDependentResourcesReady();

        var status = Objects.requireNonNullElseGet(resource.getStatus(), ProjectStatus::new);
        status.setStatus(ready ? "ready" : "not-ready");
        resource.setStatus(status);

        // manually reschedule to force call `FirstReadyCondition#isMet` 
        // even when new events received from informer
        return UpdateControl.updateStatus(resource)
                .rescheduleAfter(Duration.ofSeconds(10));
    }

    public static class FirstReadyCondition implements Condition<StatefulSet, Project> {
        @Override
        public boolean isMet(Project primary, StatefulSet secondary, Context<Project> context) {
            var client = context.managedDependentResourceContext().getMandatory("client", KubernetesClient.class);

            var options = new ListOptionsBuilder().withLabelSelector("app.kubernetes.io/name=" + secondary.getMetadata().getName()).build();
            var statefulSets = client.resources(StatefulSet.class).list(options);
            if (!statefulSets.getItems().isEmpty()) {
                log.info("secondary status: {}", secondary.getStatus());
                log.info("fetched status: {}", statefulSets.getItems().get(0).getStatus());
            }

            var readyReplicas = secondary.getStatus().getReadyReplicas();
            return readyReplicas != null && readyReplicas > 0;
        }
}

Managed dependent resources are discriminated based on labelSelector:

@KubernetesDependent(labelSelector = FirstStatefulSet.SELECTOR)
public class FirstStatefulSet extends BaseStatefulSet {
    public static final String SELECTOR = "app.kubernetes.io/managed-by=project-operator," +
                                          "app.kubernetes.io/component=first";
    // ...
}

What did you expect to see?

  1. Ready post-condition isMet to eventually return true when StatefulSets readyReplicas becomes 1.
  2. Both StatefulSet reconciled and CR status updated based on WorkflowReconcileResult.

What did you see instead? Under which circumstances?

  1. Ready post-condition isMet based on secondary resource status always returns false since it receives same cached secondary resource from the moment it was reconciled.
  2. Workflow hangs after reconciling the first StatefulSet.

Logs demonstrate that actual StatefulSet status is updated but secondary passed to isMet is still the same:

secondary status: StatefulSetStatus(availableReplicas=0, collisionCount=null, conditions=[], currentReplicas=null, currentRevision=null, observedGeneration=null, readyReplicas=null, replicas=0, updateRevision=null, updatedReplicas=null, additionalProperties={})
fetched status: StatefulSetStatus(availableReplicas=0, collisionCount=0, conditions=[], currentReplicas=1, currentRevision=first-p1-6dc67d5df7, observedGeneration=1, readyReplicas=null, replicas=1, updateRevision=first-p1-6dc67d5df7, updatedReplicas=1, additionalProperties={}

Environment

  • k3d version v5.4.6 / k3s version v1.24.4-k3s1
  • OpenJDK 17 (Temurin-17.0.4.1+1)
  • Quarkiverse Java Operator SDK 4.0.3
  • Quarkus 2.13.3.Final
  • Java Operator SDK 3.2.3
  • kubectl: client 1.25.3, server 1.24.4+k3s1

Additional context

I'm implementing an operator for a legacy system consisting of a bunch of both stateful and stateless microservices which requires strict startup order for some of them, so I tried workflow feature.

Just dependsOn is not enough since reconciler will start second dependent service reconcilation right after first one is reconciled (but not ready yet). Thus readyPostcondition.

For several managed dependent resources of same type I used approach with discriminating them by label selector like app.kubernetes.io/managed-by=...,app.kubernetes.io/component=... where component is unique for the resource type among resources managed by this operator.

See also: https://discord.com/channels/723455000604573736/780769121305493544/1032712459200503829

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions