You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To describe the problem, let's say if we create a resource from our controller that has a generated ID
13
-
we have to store this ID, usually in the `.status` of the custom resource.
14
-
(In other words, we cannot address the resource only by using the values from the `.spec`.)
15
-
However, operator frameworks cache resources
16
-
using informers, so the update that you made to the status of the custom resource will just eventually get into
17
-
the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that we will
18
-
see the stale custom resource in the cache (in another word, the cache is eventually consistent) without the generated ID in the status.
19
-
This is a problem since we might not know at that point that the desired resources were already created, so it might happen that you try to
20
-
create them again.
13
+
To describe the problem, let's say that our controller needs to create a resource that has a generated identifier, i.e.
14
+
a resource which identifier cannot be directely derived from the custom resource's desired state as specified in its
15
+
`spec` field. In order to record the fact that the resource was successfully created, and to avoid attempting to
16
+
recreate the resource again in subsequent reconciliations, it is typical for this type of controller to store the
17
+
generated identifier in the custom resource's `status` field.
21
18
22
-
Java Operator SDK now out of the box provides a utility class [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java)
23
-
if you use it, the framework guarantees that the next reconciliation will always receive the updated resource:
19
+
The Java Operator SDK relies on the informers' cache to retrieve resources. These caches, however, are only guaranteed
20
+
to be eventually consistent. It could happen, then, that, if some other event occurs, that would result in a new
21
+
reconciliation, **before** the update that's been made to our resource status has the chance to be propagated first to
22
+
the cluster and then back to the informer cache, that the resource in the informer cache does **not** contain the latest
23
+
version as modified by the reconciler. This would result in a new reconciliation where the generated identifier would be
24
+
missing from the resource status and, therefore, another attempt to create the resource by the reconciler, which is not
This utility class will do the magic for you. But how does it work?
46
-
There are multiple ways to solve this problem,
47
-
but ultimately, we only provided the solution below. (If you want to dig deep in alternatives, see this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)).
53
+
How does `PrimaryUpdateAndCacheUtils` work?
54
+
There are multiple ways to solve this problem, but ultimately, we only provide the solution described below. (If you
55
+
want to dig deep in alternatives, see
56
+
this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)).
48
57
49
-
The trick is to cache the resource from the response of our update in an additional cache on top of the informer's cache.
50
-
If we read the resource, we first check if it is in the overlay cache and read it from there if present, otherwise read it from the cache of the informer.
51
-
If the informer receives an event with a fresh resource, we always remove the resource from the overlay
58
+
The trick is to cache the resource from the response of our update in an additional cache on top of the informer's
59
+
cache. If we read the resource, we first check if it is in the overlay cache and read it from there if present,
60
+
otherwise read it from the cache of the informer. If the informer receives an event with a fresh resource, we always
61
+
remove the resource from the overlay
52
62
cache, since that is a more recent resource. But this **works only** if the update is done **with optimistic locking**.
53
63
So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version,
54
64
and try to update again using the new resource (applied our changes again) with optimistic locking.
55
65
56
-
So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic locking, it can happen that
57
-
another party does an update on the resource just before we do. The informer receives the event from another party's update,
58
-
if we would compare resource versions with this resource and the previously cached resource (response from our update),
59
-
that would be different, and in general there is no elegant way to determine if this new version that
60
-
informer receives an event from an update that happened before or after our update.
66
+
So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic
67
+
locking, it can happen that
68
+
another party does an update on the resource just before we do. The informer receives the event from another party's
69
+
update,
70
+
if we would compare resource versions with this resource and the previously cached resource (response from our update),
71
+
that would be different, and in general there is no elegant way to determine if this new version that
72
+
informer receives an event from an update that happened before or after our update.
61
73
(Note that informer's watch can lose connection and other edge cases)
62
74
63
-
64
75
```mermaid
65
76
flowchart TD
66
77
A["Update Resource with Lock"] --> B{"Is Successful"}
@@ -74,10 +85,11 @@ flowchart TD
74
85
```
75
86
76
87
If we do our update with optimistic locking, it simplifies the situation, and we can have strong guarantees.
77
-
Since we know if the update with optimistic locking is successful, we have the up-to-date resource in our caches.
78
-
Thus, the next event we receive will be the one that is the result of our update
79
-
(or a newer one if somebody did an update after, but that is fine since it will contain our allocated values).
80
-
So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it from there.
88
+
Since we know if the update with optimistic locking is successful, we have the up-to-date resource in our caches.
89
+
Thus, the next event we receive will be the one that is the result of our update
90
+
(or a newer one if somebody did an update after, but that is fine since it will contain our allocated values).
91
+
So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it
92
+
from there.
81
93
Note that we store the result in overlay cache only if at that time we still have the original resource in cache.
82
-
If the cache already updated, that means that we already received a new event after our update,
94
+
If the cache already updated, that means that we already received a new event after our update,
83
95
so we have a fresh resource in the informer cache already.
0 commit comments