Description
Bug Report
Reconciler with managed workflow and managed dependent resources with ready post-condition doesn't progress after reconciling first dependent resource.
Resource with readyPostcondition
is reconciled successfully but condition based on secondary is never met since it receives the same secondary resource cached at time of its reconcilation.
Tried it with both WATCH_ALL_NAMESPACES
and WATCH_CURRENT_NAMESPACE
.
What did you do?
Full reproducer: https://github.com/grossws/operatorsdk-es-issue
@ControllerConfiguration(
name = "project-operator",
dependents = {
@Dependent(name = "first-svc", type = FirstService.class),
@Dependent(name = "second-svc", type = SecondService.class),
@Dependent(name = "first", type = FirstStatefulSet.class,
dependsOn = {"first-svc"},
readyPostcondition = MyReconciler.FirstReadyCondition.class),
@Dependent(name = "second", type = SecondStatefulSet.class,
dependsOn = {"second-svc", "first"}),
}
)
public class MyReconciler implements Reconciler<Project>, ContextInitializer<Project> {
static final Logger log = LoggerFactory.getLogger(MyReconciler.class);
@Inject
KubernetesClient client;
@Override
public void initContext(Project primary, Context<Project> context) {
context.managedDependentResourceContext().put("client", client);
}
@Override
public UpdateControl<Project> reconcile(Project resource, Context<Project> context) throws Exception {
var ready = context.managedDependentResourceContext().getWorkflowReconcileResult().orElseThrow().allDependentResourcesReady();
var status = Objects.requireNonNullElseGet(resource.getStatus(), ProjectStatus::new);
status.setStatus(ready ? "ready" : "not-ready");
resource.setStatus(status);
// manually reschedule to force call `FirstReadyCondition#isMet`
// even when new events received from informer
return UpdateControl.updateStatus(resource)
.rescheduleAfter(Duration.ofSeconds(10));
}
public static class FirstReadyCondition implements Condition<StatefulSet, Project> {
@Override
public boolean isMet(Project primary, StatefulSet secondary, Context<Project> context) {
var client = context.managedDependentResourceContext().getMandatory("client", KubernetesClient.class);
var options = new ListOptionsBuilder().withLabelSelector("app.kubernetes.io/name=" + secondary.getMetadata().getName()).build();
var statefulSets = client.resources(StatefulSet.class).list(options);
if (!statefulSets.getItems().isEmpty()) {
log.info("secondary status: {}", secondary.getStatus());
log.info("fetched status: {}", statefulSets.getItems().get(0).getStatus());
}
var readyReplicas = secondary.getStatus().getReadyReplicas();
return readyReplicas != null && readyReplicas > 0;
}
}
Managed dependent resources are discriminated based on labelSelector
:
@KubernetesDependent(labelSelector = FirstStatefulSet.SELECTOR)
public class FirstStatefulSet extends BaseStatefulSet {
public static final String SELECTOR = "app.kubernetes.io/managed-by=project-operator," +
"app.kubernetes.io/component=first";
// ...
}
What did you expect to see?
- Ready post-condition
isMet
to eventually return true whenStatefulSet
sreadyReplicas
becomes 1. - Both
StatefulSet
reconciled and CR status updated based onWorkflowReconcileResult
.
What did you see instead? Under which circumstances?
- Ready post-condition
isMet
based on secondary resource status always returnsfalse
since it receives same cached secondary resource from the moment it was reconciled. - Workflow hangs after reconciling the first
StatefulSet
.
Logs demonstrate that actual StatefulSet
status is updated but secondary
passed to isMet
is still the same:
secondary status: StatefulSetStatus(availableReplicas=0, collisionCount=null, conditions=[], currentReplicas=null, currentRevision=null, observedGeneration=null, readyReplicas=null, replicas=0, updateRevision=null, updatedReplicas=null, additionalProperties={})
fetched status: StatefulSetStatus(availableReplicas=0, collisionCount=0, conditions=[], currentReplicas=1, currentRevision=first-p1-6dc67d5df7, observedGeneration=1, readyReplicas=null, replicas=1, updateRevision=first-p1-6dc67d5df7, updatedReplicas=1, additionalProperties={}
Environment
- k3d version v5.4.6 / k3s version v1.24.4-k3s1
- OpenJDK 17 (Temurin-17.0.4.1+1)
- Quarkiverse Java Operator SDK 4.0.3
- Quarkus 2.13.3.Final
- Java Operator SDK 3.2.3
- kubectl: client 1.25.3, server 1.24.4+k3s1
Additional context
I'm implementing an operator for a legacy system consisting of a bunch of both stateful and stateless microservices which requires strict startup order for some of them, so I tried workflow feature.
Just dependsOn
is not enough since reconciler will start second dependent service reconcilation right after first one is reconciled (but not ready yet). Thus readyPostcondition
.
For several managed dependent resources of same type I used approach with discriminating them by label selector like app.kubernetes.io/managed-by=...,app.kubernetes.io/component=...
where component is unique for the resource type among resources managed by this operator.
See also: https://discord.com/channels/723455000604573736/780769121305493544/1032712459200503829