Skip to content

Commit 3a863f6

Browse files
committed
docs: more
skip-ci
1 parent 74ea5bd commit 3a863f6

File tree

4 files changed

+111
-77
lines changed

4 files changed

+111
-77
lines changed

docs/documentation/getting-started.md

Lines changed: 26 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -9,35 +9,44 @@ permalink: /docs/getting-started
99

1010
## Introduction & Resources on Operators
1111

12-
Operators are easy and simple way to manage resource on Kubernetes clusters but
13-
also outside the cluster. The goal of this SDK is to allow writing operators in Java by
14-
providing a nice API and handling common issues regarding the operators on framework level.
15-
16-
For an introduction, what is an operator see this [blog post](https://blog.container-solutions.com/kubernetes-operators-explained).
12+
Operators manage both cluster and non-cluster resources on behalf of Kubernetes. This Java
13+
Operator SDK (JOSDK) aims at making it as easy as possible to write Kubernetes operators in Java
14+
using an API that should feel natural to Java developers and without having to worry about many
15+
low-level details that the SDK handles automatically.
1716

18-
You can read about the common problems what is this operator framework is solving for you [here](https://blog.container-solutions.com/a-deep-dive-into-the-java-operator-sdk).
17+
For an introduction on operators, please see this
18+
[blog post](https://blog.container-solutions.com/kubernetes-operators-explained).
19+
20+
You can read about the common problems JOSDK is solving for you
21+
[here](https://blog.container-solutions.com/a-deep-dive-into-the-java-operator-sdk).
22+
23+
You can also refer to the
24+
[Writing Kubernetes operators using JOSDK blog series](https://developers.redhat.com/articles/2022/02/15/write-kubernetes-java-java-operator-sdk)
25+
.
1926

2027
## Getting Started
2128

22-
The easiest way to get started with SDK is start [minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) and
23-
execute one of our [examples](https://github.com/java-operator-sdk/samples/tree/main/mysql-schema).
24-
There is a dedicated page to describe how to [use samples](/docs/using-samples).
29+
The easiest way to get started with SDK is to start
30+
[minikube](https://kubernetes.io/docs/tasks/tools/install-minikube/) and
31+
execute one of our [examples](https://github.com/java-operator-sdk/samples/tree/main/mysql-schema).
32+
There is a dedicated page to describe how to [use the samples](/docs/use-samples).
2533

26-
Here are the main steps to develop the code and deploy the operator to a Kubernetes cluster. A more detailed and specific
27-
version can be found under `samples/mysql-schema/README.md`.
34+
Here are the main steps to develop the code and deploy the operator to a Kubernetes cluster.
35+
A more detailed and specific version can be found under `samples/mysql-schema/README.md`.
2836

29-
1. Setup kubectl to work with your Kubernetes cluster of choice.
37+
1. Setup `kubectl` to work with your Kubernetes cluster of choice.
3038
1. Apply Custom Resource Definition
3139
1. Compile the whole project (framework + samples) using `mvn install` in the root directory
32-
1. Run the main class of the sample you picked and check out the sample's README to see what it does.
33-
When run locally the framework will use your Kubernetes client configuration (in ~/.kube/config) to make the connection
34-
to the cluster. This is why it was important to set up kubectl up front.
40+
1. Run the main class of the sample you picked and check out the sample's README to see what it
41+
does. When run locally the framework will use your Kubernetes client configuration (in `~/.
42+
kube/config`) to establish a connection to the cluster. This is why it was important to set
43+
up `kubectl` up front.
3544
1. You can work in this local development mode to play with the code.
3645
1. Build the Docker image and push it to the registry
3746
1. Apply RBAC configuration
3847
1. Apply deployment configuration
39-
1. Verify if the operator is up and running. Don't run it locally anymore to avoid conflicts in processing events from
40-
the cluster's API server.
48+
1. Verify if the operator is up and running. Don't run it locally anymore to avoid conflicts in
49+
processing events from the cluster's API server.
4150

4251

4352

docs/documentation/glossary.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,11 @@ permalink: /docs/glossary
77

88
# Glossary
99

10-
- **Primary Resource** - the resource that represents the desired state that the controller is working
11-
to achieve. While this is often a Custom Resource, it can be also be a Kubernetes native resource (Deployment,
12-
ConfigMape,...).
10+
- **Primary Resource** - the resource that represents the desired state that the controller is
11+
working
12+
to achieve. While this is often a Custom Resource, it can be also be a Kubernetes native
13+
resource (Deployment,
14+
ConfigMap,...).
1315
- **Secondary Resource** - any resource that the controller needs to manage the reach the desired state
1416
represented by the primary resource. These resources can be created, updated, deleted or simply
1517
read depending on the use case. For example, the `Deployment` controller manages `ReplicatSet`

docs/documentation/intro-operators.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ This page provides a selection of articles that gives an introduction to Kuberne
1313

1414
- [Introduction of the concept of Kubernetes Operators](https://blog.container-solutions.com/kubernetes-operators-explained)
1515
- [Operator pattern explained in Kubernetes documentation](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
16-
- [An explanation why Java Operators makes sense](https://blog.container-solutions.com/cloud-native-java-infrastructure-automation-with-kubernetes-operators)
16+
- [An explanation why Java Operators makes sense](https://blog.container-solutions.com/cloud-native-java-infrastructure-automation-with-kubernetes-operators)
1717
- [What are the problems an operator framework is solving](https://csviri.medium.com/deep-dive-building-a-kubernetes-operator-sdk-for-java-developers-5008218822cb)
18+
- [Writing Kubernetes operators using JOSDK blog series](https://developers.redhat.com/articles/2022/02/15/write-kubernetes-java-java-operator-sdk)
1819

docs/documentation/patterns-best-practices.md

Lines changed: 78 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -7,79 +7,101 @@ permalink: /docs/patterns-best-practices
77

88
# Patterns and Best Practices
99

10-
This document describes patterns and best practices, to build and run operators, and how to implement them in terms of
11-
Java Operator SDK.
10+
This document describes patterns and best practices, to build and run operators, and how to
11+
implement them in terms of the Java Operator SDK (JOSDK).
1212

13-
See also best practices in [Operator SDK](https://sdk.operatorframework.io/docs/best-practices/best-practices/).
13+
See also best practices
14+
in [Operator SDK](https://sdk.operatorframework.io/docs/best-practices/best-practices/).
1415

1516
## Implementing a Reconciler
1617

1718
### Reconcile All The Resources All the Time
1819

19-
The reconciliation can be triggered by events from multiple sources. It could be tempting to check the events and
20-
reconcile just the related resource or subset of resources that the controller manages. However, this is **considered as
21-
an anti-pattern** in operators. If triggered, all resources should be reconciled. Usually this means only comparing the
22-
target state with the current state in the cache for most of the resource. The reason behind this is events not reliable
23-
In general, this means events can be lost. In addition to that the operator can crash and while down will miss events.
20+
The reconciliation can be triggered by events from multiple sources. It could be tempting to check
21+
the events and reconcile just the related resource or subset of resources that the controller
22+
manages. However, this is **considered an anti-pattern** for operators because the distributed
23+
nature of Kubernetes makes it difficult to ensure that all events are always received. If, for
24+
some reason, your operator doesn't receive some events, if you do not reconcile the whole state,
25+
you might be operating with improper assumptions about the state of the cluster. This is why it
26+
is important to always reconcile all the resources, no matter how tempting it might be to only
27+
consider a subset. Luckily, JOSDK tries to make it as easy and efficient as possible by
28+
providing smart caches to avoid unduly accessing the Kubernetes API server and by making sure
29+
your reconciler is only triggered when needed.
2430

25-
In addition to that such approach might even complicate implementation logic in the `Reconciler`, since parallel
26-
execution of the reconciler is not allowed for the same custom resource, there can be multiple events received for the
27-
same resource or dependent resource during an ongoing execution, ordering those events could be also challenging.
28-
29-
Since there is a consensus regarding this in the industry, from v2 the events are not even accessible for
30-
the `Reconciler`.
31+
Since there is a consensus regarding this topic in the industry, JOSDK does not provide
32+
event access from `Reconciler` implementations anymore starting with version 2 of the framework.
3133

3234
### EventSources and Caching
3335

34-
As mentioned above during a reconciliation best practice is to reconcile all the dependent resources managed by the
35-
controller. This means that we want to compare a target state with the actual state of the cluster. Reading the actual
36-
state of a resource from the Kubernetes API Server directly all the time would mean a significant load. Therefore, it's
37-
a common practice to instead create a watch for the dependent resources and cache their latest state. This is done
38-
following the Informer pattern. In Java Operator SDK, informer is wrapped into an EventSource, to integrate it with the
39-
eventing system of the framework, resulting in `InformerEventSource`.
40-
41-
A new event that triggers the reconciliation is propagated when the actual resource is already in cache. So in
42-
reconciler what should be just done is to compare the target calculated state of a dependent resource of the actual
43-
state from the cache of the event source. If it is changed or not in the cache it needs to be created, respectively
44-
updated.
36+
As mentioned above during a reconciliation best practice is to reconcile all the dependent resources
37+
managed by the controller. This means that we want to compare a desired state with the actual
38+
state of the cluster. Reading the actual state of a resource from the Kubernetes API Server
39+
directly all the time would mean a significant load. Therefore, it's a common practice to
40+
instead create a watch for the dependent resources and cache their latest state. This is done
41+
following the Informer pattern. In Java Operator SDK, informers are wrapped into an `EventSource`,
42+
to integrate it with the eventing system of the framework. This is implemented by the
43+
`InformerEventSource` class.
44+
45+
A new event that triggers the reconciliation is only propagated to the `Reconciler` when the actual
46+
resource is already in cache. `Reconciler` implementations therefore only need to compare the
47+
desired state with the observed one provided by the cached resource. If the resource cannot be
48+
found in the cache, it therefore needs to be created. If the actual state doesn't match the
49+
desired state, the resource needs to be updated.
4550

4651
### Idempotency
4752

48-
Since all the resources are reconciled during an execution and an execution can be triggered quite often, also retries
49-
of a reconciliation can happen naturally in operators, the implementation of a `Reconciler`
50-
needs to be idempotent. Luckily, since operators are usually managing already declarative resources, this is trivial to
51-
do in most cases.
53+
Since all resources should be reconciled when your `Reconciler` is triggered and reconciliations
54+
can be triggered multiple times for any given resource, especially when retry policies are in
55+
place, it is especially important that `Reconciler` implementations be idempotent, meaning that
56+
the same observed state should result in exactly the same outcome. This also means that
57+
operators should generally operate in stateless fashion. Luckily, since operators are usually
58+
managing declarative resources, ensuring idempotency is usually not difficult.
5259

5360
### Sync or Async Way of Resource Handling
5461

55-
In an implementation of reconciliation there can be a point when reconciler needs to wait a non-insignificant amount of
56-
time while a resource gets up and running. For example, reconciler would do some additional step only if a Pod is ready
57-
to receive requests. This problem can be approached in two ways synchronously or asynchronously.
58-
59-
The async way is just return from the reconciler, if there are informers properly in place for the target resource,
60-
reconciliation will be triggered on change. During the reconciliation the pod can be read from the cache of the informer
61-
and a check on it's state can be conducted again. The benefit of this approach is that it will free up the thread, so it
62-
can be used to reconcile other resources.
63-
64-
The sync way would be to periodically poll the cache of the informer for the pod's state, until the target state is
65-
reached. This would block the thread until the state is reached, which in some cases could take quite long.
66-
67-
## Why to Have Automated Retries?
68-
69-
Automatic retries are in place by default, it can be fine-tuned, but in general it's not advised to turn of automatic
70-
retries. One of the reasons is that issues like network error naturally happen and are usually solved by a retry.
71-
Another typical situation is for example when a dependent resource or the custom resource is updated, during the update
72-
usually there is optimistic version control in place. So if someone updated the resource during reconciliation, maybe
73-
using `kubectl` or another process, the update would fail on a conflict. A retry solves this problem simply by executing
74-
the reconciliation again.
62+
Depending on your use case, it's possible that your reconciliation logic needs to wait a
63+
non-insignificant amount of time while the operator waits for resources to reach their desired
64+
state. For example, you `Reconciler` might need to wait for a `Pod` to get ready before
65+
performing additional actions. This problem can be approached either synchronously or
66+
asynchronously.
67+
68+
The asynchronous way is to just exit the reconciliation logic as soon as the `Reconciler`
69+
determines that it cannot complete its full logic at this point in time. This frees resources to
70+
process other primary resource events. However, this requires that adequate event sources are
71+
put in place to monitor state changes of all the resources the operator waits for. When this is
72+
done properly, any state change will trigger the `Reconciler` again and it will get the
73+
opportunity to finish its processing
74+
75+
The synchronous way would be to periodically poll the resources' state until they reach their
76+
desired state. If this is done in the context of the `reconcile` method of your `Reconciler`
77+
implementation, this would block the current thread for possibly a long time. It's therefore
78+
usually recommended to use the asynchronous processing fashion.
79+
80+
## Why have Automatic Retries?
81+
82+
Automatic retries are in place by default and can be configured to your needs. It is also
83+
possible to completely deactivate the feature, though we advise against it. The main reason
84+
configure automatic retries for your `Reconciler` is due to the fact that errors occur quite
85+
often due to the distributed nature of Kubernetes: transient network errors can be easily dealt
86+
with by automatic retries. Similarly, resources can be modified by different actors at the same
87+
time so it's not unheard of to get conflicts when working with Kubernetes resources. Such
88+
conflicts can usually be quite naturally resolved by reconciling the resource again. If it's
89+
done automatically, the whole process can be completely transparent.
7590

7691
## Managing State
7792

78-
When managing only kubernetes resources an explicit state is not necessary about the resources. The state can be
79-
read/watched, also filtered using labels. Or just following some naming convention. However, when managing external
80-
resources, there can be a situation for example when the created resource can only be addressed by an ID generated when
81-
the resource was created. This ID needs to be stored, so on next reconciliation it could be used to addressing the
82-
resource. One place where it could go is the status sub-resource. On the other hand by definition status should be just
83-
the result of a reconciliation. Therefore, it's advised in general, to put such state into a separate resource usually a
84-
Kubernetes Secret or ConfigMap or a dedicated CustomResource, where the structure can be also validated.
93+
Thanks to the declarative nature of Kubernetes resources, operators that deal only with
94+
Kubernetes resources can operator in a stateless fashion, i.e. they do not need to maintain
95+
information about the state of these resources, as it should be possible to completely rebuild
96+
the resource state from its representation (that's what declarative means, after all).
97+
However, this usually doesn't hold true anymore when dealing with external resources and it
98+
might be necessary for the operator to keep track of this external state so that it is available
99+
when another reconciliation occurs. While such state could be put in the primary resource's
100+
status sub-resource, this could become quickly difficult to manage if a lot of state needs to be
101+
tracked. It also goes against the best practice that a resource's status should represent the
102+
actual resource state, when its spec represents the desired state. Putting state that doesn't
103+
striclty represent the resource's actual state is therefore discouraged. Instead, it's
104+
advised to put such state into a separate resource meant for this purpose such as a
105+
Kubernetes Secret or ConfigMap or even a dedicated Custom Resource, which structure can be more
106+
easily validated.
85107

0 commit comments

Comments
 (0)