Skip to content

Commit 5520c14

Browse files
authored
improve: status cache for next reconciliation - only the lock version (#2800)
1 parent 937a9a9 commit 5520c14

16 files changed

+306
-620
lines changed

docs/content/en/docs/documentation/reconciler.md

Lines changed: 17 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -175,23 +175,23 @@ From v5, by default, the finalizer is added using Server Side Apply. See also `U
175175
It is typical to want to update the status subresource with the information that is available during the reconciliation.
176176
This is sometimes referred to as the last observed state. When the primary resource is updated, though, the framework
177177
does not cache the resource directly, relying instead on the propagation of the update to the underlying informer's
178-
cache. It can, therefore, happen that, if other events trigger other reconciliations before the informer cache gets
178+
cache. It can, therefore, happen that, if other events trigger other reconciliations, before the informer cache gets
179179
updated, your reconciler does not see the latest version of the primary resource. While this might not typically be a
180180
problem in most cases, as caches eventually become consistent, depending on your reconciliation logic, you might still
181-
require the latest status version possible, for example if the status subresource is used as a communication mechanism,
182-
see [Representing Allocated Values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values)
181+
require the latest status version possible, for example, if the status subresource is used to store allocated values.
182+
See [Representing Allocated Values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values)
183183
from the Kubernetes docs for more details.
184184

185-
The framework provides utilities to help with these use cases with
186-
[`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java).
187-
These utility methods come in two flavors:
185+
The framework provides the
186+
[`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) utility class
187+
to help with these use cases.
188188

189-
#### Using internal cache
190-
191-
In almost all cases for this purpose, you can use internal caches:
189+
This class' methods use internal caches in combination with update methods that leveraging
190+
optimistic locking. If the update method fails on optimistic locking, it will retry
191+
using a fresh resource from the server as base for modification.
192192

193193
```java
194-
@Override
194+
@Override
195195
public UpdateControl<StatusPatchCacheCustomResource> reconcile(
196196
StatusPatchCacheCustomResource resource, Context<StatusPatchCacheCustomResource> context) {
197197

@@ -201,85 +201,17 @@ public UpdateControl<StatusPatchCacheCustomResource> reconcile(
201201
var freshCopy = createFreshCopy(primary);
202202
freshCopy.getStatus().setValue(statusWithState());
203203

204-
var updatedResource = PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(resource, freshCopy, context);
205-
206-
return UpdateControl.noUpdate();
207-
}
208-
```
209-
210-
In the background `PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus` puts the result of the update into an internal
211-
cache and will make sure that the next reconciliation will contain the most recent version of the resource. Note that it
212-
is not necessarily the version of the resource you got as response from the update, it can be newer since other parties
213-
can do additional updates meanwhile, but if not explicitly modified, it will contain the up-to-date status.
214-
215-
See related integration test [here](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/internal).
216-
217-
This approach works with the default configuration of the framework and should be good to go in most of the cases.
218-
Without going further into the details, this won't work if `ConfigurationService.parseResourceVersionsForEventFilteringAndCaching`
219-
is set to `false` (more precisely there are some edge cases when it won't work). For that case framework provides the following solution:
220-
221-
#### Fallback approach: using `PrimaryResourceCache` cache
222-
223-
As an alternative, for very rare cases when `ConfigurationService.parseResourceVersionsForEventFilteringAndCaching`
224-
needs to be set to `false` you can use an explicit caching approach:
225-
226-
```java
227-
228-
// We on purpose don't use the provided predicate to show what a custom one could look like.
229-
private final PrimaryResourceCache<StatusPatchPrimaryCacheCustomResource> cache =
230-
new PrimaryResourceCache<>(
231-
(statusPatchCacheCustomResourcePair, statusPatchCacheCustomResource) ->
232-
statusPatchCacheCustomResource.getStatus().getValue()
233-
>= statusPatchCacheCustomResourcePair.afterUpdate().getStatus().getValue());
234-
235-
@Override
236-
public UpdateControl<StatusPatchPrimaryCacheCustomResource> reconcile(
237-
StatusPatchPrimaryCacheCustomResource primary,
238-
Context<StatusPatchPrimaryCacheCustomResource> context) {
239-
240-
// cache will compare the current and the cached resource and return the more recent. (And evict the old)
241-
primary = cache.getFreshResource(primary);
242-
243-
// omitted logic
244-
245-
var freshCopy = createFreshCopy(primary);
204+
var updatedResource = PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context);
246205

247-
freshCopy.getStatus().setValue(statusWithState());
248-
249-
var updated =
250-
PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(primary, freshCopy, context, cache);
251-
252206
return UpdateControl.noUpdate();
253207
}
254-
255-
@Override
256-
public DeleteControl cleanup(
257-
StatusPatchPrimaryCacheCustomResource resource,
258-
Context<StatusPatchPrimaryCacheCustomResource> context)
259-
throws Exception {
260-
// cleanup the cache on resource deletion
261-
cache.cleanup(resource);
262-
return DeleteControl.defaultDelete();
263-
}
264-
265208
```
266209

267-
[`PrimaryResourceCache`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/support/PrimaryResourceCache.java)
268-
is designed for this purpose. As shown in the example above, it is up to you to provide a predicate to determine if the
269-
resource is more recent than the one available. In other words, when to evict the resource from the cache. Typically, as
270-
shown in
271-
the [integration test](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/primarycache)
272-
you can have a counter in status to check on that.
273-
274-
Since all of this happens explicitly, you cannot use this approach for managed dependent resources and workflows and
275-
will need to use the unmanaged approach instead. This is due to the fact that managed dependent resources always get
276-
their associated primary resource from the underlying informer event source cache.
277-
278-
#### Additional remarks
210+
After the update `PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource` puts the result of the update into an internal
211+
cache and the framework will make sure that the next reconciliation contains the most recent version of the resource.
212+
Note that it is not necessarily the same version returned as response from the update, it can be a newer version since other parties
213+
can do additional updates meanwhile. However, unless it has been explicitly modified, that
214+
resource will contain the up-to-date status.
279215

280-
As shown in the integration tests, there is no optimistic locking used when updating the
281-
[resource](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/internal/StatusPatchCacheReconciler.java#L41)
282-
(in other words `metadata.resourceVersion` is set to `null`). This is desired since you don't want the patch to fail on
283-
update.
284216

285-
In addition, you can configure the [Fabric8 client retry](https://github.com/fabric8io/kubernetes-client?tab=readme-ov-file#configuring-the-client).
217+
See related integration test [here](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache).

0 commit comments

Comments
 (0)