@@ -7,79 +7,101 @@ permalink: /docs/patterns-best-practices
7
7
8
8
# Patterns and Best Practices
9
9
10
- This document describes patterns and best practices, to build and run operators, and how to implement them in terms of
11
- Java Operator SDK.
10
+ This document describes patterns and best practices, to build and run operators, and how to
11
+ implement them in terms of the Java Operator SDK (JOSDK) .
12
12
13
- See also best practices in [ Operator SDK] ( https://sdk.operatorframework.io/docs/best-practices/best-practices/ ) .
13
+ See also best practices
14
+ in [ Operator SDK] ( https://sdk.operatorframework.io/docs/best-practices/best-practices/ ) .
14
15
15
16
## Implementing a Reconciler
16
17
17
18
### Reconcile All The Resources All the Time
18
19
19
- The reconciliation can be triggered by events from multiple sources. It could be tempting to check the events and
20
- reconcile just the related resource or subset of resources that the controller manages. However, this is ** considered as
21
- an anti-pattern** in operators. If triggered, all resources should be reconciled. Usually this means only comparing the
22
- target state with the current state in the cache for most of the resource. The reason behind this is events not reliable
23
- In general, this means events can be lost. In addition to that the operator can crash and while down will miss events.
20
+ The reconciliation can be triggered by events from multiple sources. It could be tempting to check
21
+ the events and reconcile just the related resource or subset of resources that the controller
22
+ manages. However, this is ** considered an anti-pattern** for operators because the distributed
23
+ nature of Kubernetes makes it difficult to ensure that all events are always received. If, for
24
+ some reason, your operator doesn't receive some events, if you do not reconcile the whole state,
25
+ you might be operating with improper assumptions about the state of the cluster. This is why it
26
+ is important to always reconcile all the resources, no matter how tempting it might be to only
27
+ consider a subset. Luckily, JOSDK tries to make it as easy and efficient as possible by
28
+ providing smart caches to avoid unduly accessing the Kubernetes API server and by making sure
29
+ your reconciler is only triggered when needed.
24
30
25
- In addition to that such approach might even complicate implementation logic in the ` Reconciler ` , since parallel
26
- execution of the reconciler is not allowed for the same custom resource, there can be multiple events received for the
27
- same resource or dependent resource during an ongoing execution, ordering those events could be also challenging.
28
-
29
- Since there is a consensus regarding this in the industry, from v2 the events are not even accessible for
30
- the ` Reconciler ` .
31
+ Since there is a consensus regarding this topic in the industry, JOSDK does not provide
32
+ event access from ` Reconciler ` implementations anymore starting with version 2 of the framework.
31
33
32
34
### EventSources and Caching
33
35
34
- As mentioned above during a reconciliation best practice is to reconcile all the dependent resources managed by the
35
- controller. This means that we want to compare a target state with the actual state of the cluster. Reading the actual
36
- state of a resource from the Kubernetes API Server directly all the time would mean a significant load. Therefore, it's
37
- a common practice to instead create a watch for the dependent resources and cache their latest state. This is done
38
- following the Informer pattern. In Java Operator SDK, informer is wrapped into an EventSource, to integrate it with the
39
- eventing system of the framework, resulting in ` InformerEventSource ` .
40
-
41
- A new event that triggers the reconciliation is propagated when the actual resource is already in cache. So in
42
- reconciler what should be just done is to compare the target calculated state of a dependent resource of the actual
43
- state from the cache of the event source. If it is changed or not in the cache it needs to be created, respectively
44
- updated.
36
+ As mentioned above during a reconciliation best practice is to reconcile all the dependent resources
37
+ managed by the controller. This means that we want to compare a desired state with the actual
38
+ state of the cluster. Reading the actual state of a resource from the Kubernetes API Server
39
+ directly all the time would mean a significant load. Therefore, it's a common practice to
40
+ instead create a watch for the dependent resources and cache their latest state. This is done
41
+ following the Informer pattern. In Java Operator SDK, informers are wrapped into an ` EventSource ` ,
42
+ to integrate it with the eventing system of the framework. This is implemented by the
43
+ ` InformerEventSource ` class.
44
+
45
+ A new event that triggers the reconciliation is only propagated to the ` Reconciler ` when the actual
46
+ resource is already in cache. ` Reconciler ` implementations therefore only need to compare the
47
+ desired state with the observed one provided by the cached resource. If the resource cannot be
48
+ found in the cache, it therefore needs to be created. If the actual state doesn't match the
49
+ desired state, the resource needs to be updated.
45
50
46
51
### Idempotency
47
52
48
- Since all the resources are reconciled during an execution and an execution can be triggered quite often, also retries
49
- of a reconciliation can happen naturally in operators, the implementation of a ` Reconciler `
50
- needs to be idempotent. Luckily, since operators are usually managing already declarative resources, this is trivial to
51
- do in most cases.
53
+ Since all resources should be reconciled when your ` Reconciler ` is triggered and reconciliations
54
+ can be triggered multiple times for any given resource, especially when retry policies are in
55
+ place, it is especially important that ` Reconciler ` implementations be idempotent, meaning that
56
+ the same observed state should result in exactly the same outcome. This also means that
57
+ operators should generally operate in stateless fashion. Luckily, since operators are usually
58
+ managing declarative resources, ensuring idempotency is usually not difficult.
52
59
53
60
### Sync or Async Way of Resource Handling
54
61
55
- In an implementation of reconciliation there can be a point when reconciler needs to wait a non-insignificant amount of
56
- time while a resource gets up and running. For example, reconciler would do some additional step only if a Pod is ready
57
- to receive requests. This problem can be approached in two ways synchronously or asynchronously.
58
-
59
- The async way is just return from the reconciler, if there are informers properly in place for the target resource,
60
- reconciliation will be triggered on change. During the reconciliation the pod can be read from the cache of the informer
61
- and a check on it's state can be conducted again. The benefit of this approach is that it will free up the thread, so it
62
- can be used to reconcile other resources.
63
-
64
- The sync way would be to periodically poll the cache of the informer for the pod's state, until the target state is
65
- reached. This would block the thread until the state is reached, which in some cases could take quite long.
66
-
67
- ## Why to Have Automated Retries?
68
-
69
- Automatic retries are in place by default, it can be fine-tuned, but in general it's not advised to turn of automatic
70
- retries. One of the reasons is that issues like network error naturally happen and are usually solved by a retry.
71
- Another typical situation is for example when a dependent resource or the custom resource is updated, during the update
72
- usually there is optimistic version control in place. So if someone updated the resource during reconciliation, maybe
73
- using ` kubectl ` or another process, the update would fail on a conflict. A retry solves this problem simply by executing
74
- the reconciliation again.
62
+ Depending on your use case, it's possible that your reconciliation logic needs to wait a
63
+ non-insignificant amount of time while the operator waits for resources to reach their desired
64
+ state. For example, you ` Reconciler ` might need to wait for a ` Pod ` to get ready before
65
+ performing additional actions. This problem can be approached either synchronously or
66
+ asynchronously.
67
+
68
+ The asynchronous way is to just exit the reconciliation logic as soon as the ` Reconciler `
69
+ determines that it cannot complete its full logic at this point in time. This frees resources to
70
+ process other primary resource events. However, this requires that adequate event sources are
71
+ put in place to monitor state changes of all the resources the operator waits for. When this is
72
+ done properly, any state change will trigger the ` Reconciler ` again and it will get the
73
+ opportunity to finish its processing
74
+
75
+ The synchronous way would be to periodically poll the resources' state until they reach their
76
+ desired state. If this is done in the context of the ` reconcile ` method of your ` Reconciler `
77
+ implementation, this would block the current thread for possibly a long time. It's therefore
78
+ usually recommended to use the asynchronous processing fashion.
79
+
80
+ ## Why have Automatic Retries?
81
+
82
+ Automatic retries are in place by default and can be configured to your needs. It is also
83
+ possible to completely deactivate the feature, though we advise against it. The main reason
84
+ configure automatic retries for your ` Reconciler ` is due to the fact that errors occur quite
85
+ often due to the distributed nature of Kubernetes: transient network errors can be easily dealt
86
+ with by automatic retries. Similarly, resources can be modified by different actors at the same
87
+ time so it's not unheard of to get conflicts when working with Kubernetes resources. Such
88
+ conflicts can usually be quite naturally resolved by reconciling the resource again. If it's
89
+ done automatically, the whole process can be completely transparent.
75
90
76
91
## Managing State
77
92
78
- When managing only kubernetes resources an explicit state is not necessary about the resources. The state can be
79
- read/watched, also filtered using labels. Or just following some naming convention. However, when managing external
80
- resources, there can be a situation for example when the created resource can only be addressed by an ID generated when
81
- the resource was created. This ID needs to be stored, so on next reconciliation it could be used to addressing the
82
- resource. One place where it could go is the status sub-resource. On the other hand by definition status should be just
83
- the result of a reconciliation. Therefore, it's advised in general, to put such state into a separate resource usually a
84
- Kubernetes Secret or ConfigMap or a dedicated CustomResource, where the structure can be also validated.
93
+ Thanks to the declarative nature of Kubernetes resources, operators that deal only with
94
+ Kubernetes resources can operator in a stateless fashion, i.e. they do not need to maintain
95
+ information about the state of these resources, as it should be possible to completely rebuild
96
+ the resource state from its representation (that's what declarative means, after all).
97
+ However, this usually doesn't hold true anymore when dealing with external resources and it
98
+ might be necessary for the operator to keep track of this external state so that it is available
99
+ when another reconciliation occurs. While such state could be put in the primary resource's
100
+ status sub-resource, this could become quickly difficult to manage if a lot of state needs to be
101
+ tracked. It also goes against the best practice that a resource's status should represent the
102
+ actual resource state, when its spec represents the desired state. Putting state that doesn't
103
+ striclty represent the resource's actual state is therefore discouraged. Instead, it's
104
+ advised to put such state into a separate resource meant for this purpose such as a
105
+ Kubernetes Secret or ConfigMap or even a dedicated Custom Resource, which structure can be more
106
+ easily validated.
85
107
0 commit comments