diff --git a/docs/documentation/patterns-best-practices.md b/docs/documentation/patterns-best-practices.md index b23e141dc5..3230674ed8 100644 --- a/docs/documentation/patterns-best-practices.md +++ b/docs/documentation/patterns-best-practices.md @@ -84,7 +84,7 @@ possible to completely deactivate the feature, though we advise against it. The configure automatic retries for your `Reconciler` is due to the fact that errors occur quite often due to the distributed nature of Kubernetes: transient network errors can be easily dealt with by automatic retries. Similarly, resources can be modified by different actors at the same -time so it's not unheard of to get conflicts when working with Kubernetes resources. Such +time, so it's not unheard of to get conflicts when working with Kubernetes resources. Such conflicts can usually be quite naturally resolved by reconciling the resource again. If it's done automatically, the whole process can be completely transparent. @@ -94,7 +94,7 @@ Thanks to the declarative nature of Kubernetes resources, operators that deal on Kubernetes resources can operator in a stateless fashion, i.e. they do not need to maintain information about the state of these resources, as it should be possible to completely rebuild the resource state from its representation (that's what declarative means, after all). -However, this usually doesn't hold true anymore when dealing with external resources and it +However, this usually doesn't hold true anymore when dealing with external resources, and it might be necessary for the operator to keep track of this external state so that it is available when another reconciliation occurs. While such state could be put in the primary resource's status sub-resource, this could become quickly difficult to manage if a lot of state needs to be @@ -105,3 +105,19 @@ advised to put such state into a separate resource meant for this purpose such a Kubernetes Secret or ConfigMap or even a dedicated Custom Resource, which structure can be more easily validated. +## Stopping (or not) Operator in case of Informer Errors + +It can +be [configured](https://github.com/java-operator-sdk/java-operator-sdk/blob/2cb616c4c4fd0094ee6e3a0ef2a0ea82173372bf/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/config/ConfigurationService.java#L168-L168) +if the operator should stop in case of any informer error happens on startup. By default, if there ia an error on +startup and the informer for example has no permissions list the target resources (both the primary resource or +secondary resources) the operator will stop instantly. This behavior can be altered by setting the mentioned flag +to `false`, so operator will start even some informers are not started. In this case - same as in case when an informer +is started at first but experienced problems later - will continuously retry the connection indefinitely with an +exponential backoff. The operator will just stop if there is a fatal +error, [currently](https://github.com/java-operator-sdk/java-operator-sdk/blob/0e55c640bf8be418bc004e51a6ae2dcf7134c688/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerWrapper.java#L64-L66) +that is when a resource cannot be deserialized. The typical use case for changing this flag is when a list of namespaces +is watched by a controller. In is better to start up the operator, so it can handle other namespaces while there +might be a permission issue for some resources in another namespace. + + diff --git a/operator-framework/src/test/java/io/javaoperatorsdk/operator/InformerRelatedBehaviorITS.java b/operator-framework/src/test/java/io/javaoperatorsdk/operator/InformerRelatedBehaviorITS.java index ca52b0ecaf..86d2d01b67 100644 --- a/operator-framework/src/test/java/io/javaoperatorsdk/operator/InformerRelatedBehaviorITS.java +++ b/operator-framework/src/test/java/io/javaoperatorsdk/operator/InformerRelatedBehaviorITS.java @@ -22,8 +22,8 @@ import static org.junit.jupiter.api.Assertions.assertThrows; /** - * The test relies on a special minikube configuration: "min-request-timeout" to have a very low - * value, see: "minikube start --extra-config=apiserver.min-request-timeout=3" + * The test relies on a special api server configuration: "min-request-timeout" to have a very low + * value, use: "minikube start --extra-config=apiserver.min-request-timeout=3" * *

* This is important when tests are affected by permission changes, since the watch permissions are