Skip to content

Commit f4de051

Browse files
committed
docs: start improving wording
Signed-off-by: Chris Laprun <claprun@redhat.com>
1 parent f785fa3 commit f4de051

File tree

1 file changed

+53
-41
lines changed

1 file changed

+53
-41
lines changed
Lines changed: 53 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,66 +1,77 @@
11
---
2-
title: How to guarantee allocated values for next reconciliation
2+
title: How to guarantee allocated values for next reconciliation
33
date: 2025-05-22
44
author: >-
5-
[Attila Mészáros](https://github.com/csviri)
5+
[Attila Mészáros](https://github.com/csviri)
66
---
77

8-
We recently released v5.1 of Java Operator SDK. One of the highlights of this release is related to a topic of so-called
8+
We recently released v5.1 of Java Operator SDK (JOSDK). One of the highlights of this release is related to a topic of
9+
so-called
910
[allocated values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values
1011
).
1112

12-
To describe the problem, let's say if we create a resource from our controller that has a generated ID
13-
we have to store this ID, usually in the `.status` of the custom resource.
14-
(In other words, we cannot address the resource only by using the values from the `.spec`.)
15-
However, operator frameworks cache resources
16-
using informers, so the update that you made to the status of the custom resource will just eventually get into
17-
the cache of the informer. If meanwhile some other event triggers the reconciliation, it can happen that we will
18-
see the stale custom resource in the cache (in another word, the cache is eventually consistent) without the generated ID in the status.
19-
This is a problem since we might not know at that point that the desired resources were already created, so it might happen that you try to
20-
create them again.
13+
To describe the problem, let's say that our controller needs to create a resource that has a generated identifier, i.e.
14+
a resource which identifier cannot be directely derived from the custom resource's desired state as specified in its
15+
`spec` field. In order to record the fact that the resource was successfully created, and to avoid attempting to
16+
recreate the resource again in subsequent reconciliations, it is typical for this type of controller to store the
17+
generated identifier in the custom resource's `status` field.
2118

22-
Java Operator SDK now out of the box provides a utility class [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java)
23-
if you use it, the framework guarantees that the next reconciliation will always receive the updated resource:
19+
The Java Operator SDK relies on the informers' cache to retrieve resources. These caches, however, are only guaranteed
20+
to be eventually consistent. It could happen, then, that, if some other event occurs, that would result in a new
21+
reconciliation, **before** the update that's been made to our resource status has the chance to be propagated first to
22+
the cluster and then back to the informer cache, that the resource in the informer cache does **not** contain the latest
23+
version as modified by the reconciler. This would result in a new reconciliation where the generated identifier would be
24+
missing from the resource status and, therefore, another attempt to create the resource by the reconciler, which is not
25+
what we'd like.
26+
27+
Java Operator SDK now provides a utility class [
28+
`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java)
29+
to handle this particular use case. Using that overlay cache, your reconciler is guaranteed to see the most up-to-date
30+
version of the resource on the next reconciliation:
2431

2532
```java
26-
@Override
27-
public UpdateControl<StatusPatchCacheCustomResource> reconcile(
28-
StatusPatchCacheCustomResource resource,
29-
Context<StatusPatchCacheCustomResource> context) {
30-
33+
34+
@Override
35+
public UpdateControl<StatusPatchCacheCustomResource> reconcile(
36+
StatusPatchCacheCustomResource resource,
37+
Context<StatusPatchCacheCustomResource> context) {
38+
3139
// omitted code
32-
40+
3341
var freshCopy = createFreshCopy(resource); // need fresh copy just because we use the SSA version of update
3442
freshCopy
35-
.getStatus()
36-
.setValue(statusWithAllocatedValue());
43+
.getStatus()
44+
.setValue(statusWithAllocatedValue());
3745

3846
// using the utility instead of update control
3947
var updated =
40-
PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context);
48+
PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context);
4149
return UpdateControl.noUpdate();
42-
}
50+
}
4351
```
4452

45-
This utility class will do the magic for you. But how does it work?
46-
There are multiple ways to solve this problem,
47-
but ultimately, we only provided the solution below. (If you want to dig deep in alternatives, see this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)).
53+
How does `PrimaryUpdateAndCacheUtils` work?
54+
There are multiple ways to solve this problem, but ultimately, we only provide the solution described below. (If you
55+
want to dig deep in alternatives, see
56+
this [PR](https://github.com/operator-framework/java-operator-sdk/pull/2800/files)).
4857

49-
The trick is to cache the resource from the response of our update in an additional cache on top of the informer's cache.
50-
If we read the resource, we first check if it is in the overlay cache and read it from there if present, otherwise read it from the cache of the informer.
51-
If the informer receives an event with a fresh resource, we always remove the resource from the overlay
58+
The trick is to cache the resource from the response of our update in an additional cache on top of the informer's
59+
cache. If we read the resource, we first check if it is in the overlay cache and read it from there if present,
60+
otherwise read it from the cache of the informer. If the informer receives an event with a fresh resource, we always
61+
remove the resource from the overlay
5262
cache, since that is a more recent resource. But this **works only** if the update is done **with optimistic locking**.
5363
So if the update fails on conflict, we simply wait and poll the informer cache until there is a new resource version,
5464
and try to update again using the new resource (applied our changes again) with optimistic locking.
5565

56-
So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic locking, it can happen that
57-
another party does an update on the resource just before we do. The informer receives the event from another party's update,
58-
if we would compare resource versions with this resource and the previously cached resource (response from our update),
59-
that would be different, and in general there is no elegant way to determine if this new version that
60-
informer receives an event from an update that happened before or after our update.
66+
So why optimistic locking? (A bit simplified explanation) Note that if we do not update the resource with optimistic
67+
locking, it can happen that
68+
another party does an update on the resource just before we do. The informer receives the event from another party's
69+
update,
70+
if we would compare resource versions with this resource and the previously cached resource (response from our update),
71+
that would be different, and in general there is no elegant way to determine if this new version that
72+
informer receives an event from an update that happened before or after our update.
6173
(Note that informer's watch can lose connection and other edge cases)
6274

63-
6475
```mermaid
6576
flowchart TD
6677
A["Update Resource with Lock"] --> B{"Is Successful"}
@@ -74,10 +85,11 @@ flowchart TD
7485
```
7586

7687
If we do our update with optimistic locking, it simplifies the situation, and we can have strong guarantees.
77-
Since we know if the update with optimistic locking is successful, we have the up-to-date resource in our caches.
78-
Thus, the next event we receive will be the one that is the result of our update
79-
(or a newer one if somebody did an update after, but that is fine since it will contain our allocated values).
80-
So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it from there.
88+
Since we know if the update with optimistic locking is successful, we have the up-to-date resource in our caches.
89+
Thus, the next event we receive will be the one that is the result of our update
90+
(or a newer one if somebody did an update after, but that is fine since it will contain our allocated values).
91+
So if we cache the resource in the overlay cache from the response, we know that with the next event, we can remove it
92+
from there.
8193
Note that we store the result in overlay cache only if at that time we still have the original resource in cache.
82-
If the cache already updated, that means that we already received a new event after our update,
94+
If the cache already updated, that means that we already received a new event after our update,
8395
so we have a fresh resource in the informer cache already.

0 commit comments

Comments
 (0)