Skip to content

Commit b054a70

Browse files
sbernaueradwk67
andauthored
Add concepts page on Pod disruptions (#454)
* Add concepts page on Pod disruptions * Adopt to new structure * rename file * move file * add page alias * typo * add warnings * Apply suggestions from code review Co-authored-by: Andrew Kenworthy <andrew.kenworthy@stackable.de> * review * review * Add overview page for operations * typo * avoid we * review * Rename to "Allowed Pod disruption" * Apply suggestions from code review Co-authored-by: Andrew Kenworthy <andrew.kenworthy@stackable.de> --------- Co-authored-by: Andrew Kenworthy <andrew.kenworthy@stackable.de>
1 parent f42a2bf commit b054a70

File tree

8 files changed

+157
-7
lines changed

8 files changed

+157
-7
lines changed

modules/ROOT/pages/release_notes.adoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -448,11 +448,11 @@ The following new major platform features were added:
448448

449449
Cluster Operation::
450450

451-
The first part of xref:concepts:cluster_operations.adoc[Cluster operations] was rolled out in every applicable Stackable
451+
The first part of xref:concepts:operations/cluster_operations.adoc[Cluster operations] was rolled out in every applicable Stackable
452452
Operator. This supports pausing the cluster reconciliation and stopping the cluster completely. Pausing reconciliation
453453
will not apply any changes to the Kubernetes resources (e.g. when changing the custom resource). Stopping the cluster
454-
will set all replicas of StatefulSets, Deployments or DaemonSets to zero and therefore deleting all Pods belonging to
455-
that cluster (not the PVCs).
454+
will set all replicas of StatefulSets, Deployments or DaemonSets to zero and therefore result in the deletion of all Pods
455+
belonging to that cluster (not the PVCs)
456456

457457
Status Field::
458458

modules/concepts/nav.adoc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,8 @@
1111
** xref:s3.adoc[]
1212
** xref:tls_server_verification.adoc[]
1313
** xref:pod_placement.adoc[]
14-
** xref:cluster_operations.adoc[]
1514
** xref:overrides.adoc[]
15+
** xref:operations/index.adoc[]
16+
*** xref:operations/cluster_operations.adoc[]
17+
*** xref:operations/pod_placement.adoc[]
18+
*** xref:operations/pod_disruptions.adoc[]

modules/concepts/pages/cluster_operations.adoc renamed to modules/concepts/pages/operations/cluster_operations.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
21
= Cluster operations
2+
:page-aliases: ../cluster_operations.adoc
33

44
Stackable operators offer different cluster operations to control the reconciliation process. This is useful when updating operators, debugging or testing of new settings:
55

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
= Operations
2+
3+
This section of the documentation is intended for the operations teams that maintain a Stackable Data Platform installation.
4+
It provides you with the necessary details to operate it in a production environment.
5+
6+
== Service availability
7+
8+
Make sure to go through the following checklist to achieve the maximum level of availability for your services.
9+
10+
1. Make setup highly available (HA): In case the product supports running in an HA fashion, our operators will automatically
11+
configure it for you. You only need to make sure that you deploy a sufficient number of replicas. Please note that
12+
some products don't support HA.
13+
2. Reduce the number of simultaneous pod disruptions (unavailable replicas). The Stackable operators write defaults
14+
based upon knowledge about the fault tolerance of the product, which should cover most of the use-cases. For details
15+
have a look at xref:operations/pod_disruptions.adoc[].
16+
3. Reduce impact of pod disruption: Many HA capable products offer a way to gracefully shut down the service running
17+
within the Pod. The flow is as follows: Kubernetes wants to shut down the Pod and calls a hook into the Pod, which in turn
18+
interacts with the product, telling it to gracefully shut down. The final deletion of the Pod is then blocked until
19+
the product has successfully migrated running workloads away from the Pod that is to be shut down. Details covering the graceful shutdown mechanism are described in the actual operator documentation.
20+
+
21+
WARNING: Graceful shutdown is not implemented for all products yet. Please check the documentation specific to the product operator to see if it is supported (such as e.g. xref:trino:usage_guide/operations/graceful-shutdown.adoc[the documentation for Trino].
22+
23+
4. Spread workload across multiple Kubernetes nodes, racks, datacenter rooms or datacenters to guarantee availability
24+
in the case of e.g. power outages or fire in parts of the datacenter. All of this is supported by
25+
configuring an https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/[antiAffinity] as documented in
26+
xref:operations/pod_placement.adoc[]
27+
28+
== Maintenance actions
29+
30+
Sometimes you want to quickly shut down a product or update the Stackable operators without all the managed products
31+
restarting at the same time. You can achieve this using the following methods:
32+
33+
1. Quickly stop and start a whole product using `stopped` as described in xref:operations/cluster_operations.adoc[].
34+
2. Prevent any changes to your deployed product using `reconcilePaused` as described in xref:operations/cluster_operations.adoc[].
35+
36+
== Performance
37+
38+
1. You can configure the available resource every product has using xref:concepts:resources.adoc[]. The defaults are
39+
very restrained, as you should be able to spin up multiple products running on your Laptop.
40+
2. You can not only use xref:operations/pod_placement.adoc[] to achieve more resilience, but also to co-locate products
41+
that communicate frequently with each other. One example is placing HBase regionservers on the same Kubernetes node
42+
as the HDFS datanodes. Our operators already take this into account and co-locate connected services. However, if
43+
you are not satisfied with the automatically created affinities you can use ref:operations/pod_placement.adoc[] to
44+
configure your own.
45+
3. If you want to have certain services running on dedicated nodes you can also use xref:operations/pod_placement.adoc[]
46+
to force the Pods to be scheduled on certain nodes. This is especially helpful if you e.g. have Kubernetes nodes with
47+
16 cores and 64 GB, as you could allocate nearly 100% of these node resources to your Spark executors or Trino workers.
48+
In this case it is important that you https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/[taint]
49+
your Kubernetes nodes and use xref:overrides.adoc#pod-overrides[podOverrides] to add a `toleration` for the taint.
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
= Allowed Pod disruptions
2+
3+
Any downtime of our products is generally considered to be bad.
4+
Although downtime can't be prevented 100% of the time - especially if the product does not support High Availability - we can try to do our best to reduce it to an absolute minimum.
5+
6+
Kubernetes has mechanisms to ensure minimal *planned* downtime.
7+
Please keep in mind, that this only affects planned (voluntary) downtime of Pods - unplanned Kubernetes node crashes can always occur.
8+
9+
Our product operator will always deploy so-called https://kubernetes.io/docs/tasks/run-application/configure-pdb/[PodDisruptionBudget (PDB)] resources alongside the products.
10+
For every role that you specify (e.g. HDFS namenodes or Trino workers) a PDB is created.
11+
12+
== Default values
13+
The defaults depend on the individual product and can be found below the "Operations" usage guide.
14+
15+
They are based on our knowledge of each product's fault tolerance.
16+
In some cases they may be a little pessimistic, but they can be adjusted as documented in the following sections.
17+
18+
== Influencing and disabling PDBs
19+
20+
You can configure
21+
22+
1. Whether PDBs are written at all
23+
2. The `maxUnavailable` replicas for this role PDB
24+
25+
The following example
26+
27+
1. Sets `maxUnavailable` for NameNodes to `1`
28+
2. Sets `maxUnavailable` for DataNodes to `10`, which allows downtime of 10% of the total DataNodes.
29+
3. Disables PDBs for JournalNodes
30+
31+
[source,yaml]
32+
----
33+
apiVersion: hdfs.stackable.tech/v1alpha1
34+
kind: HdfsCluster
35+
metadata:
36+
name: hdfs
37+
spec:
38+
nameNodes:
39+
roleConfig: # optional, only supported on role level, *not* on rolegroup
40+
podDisruptionBudget: # optional
41+
enabled: true # optional, defaults to true
42+
maxUnavailable: 1 # optional, defaults to our "smart" calculation
43+
roleGroups:
44+
default:
45+
replicas: 3
46+
dataNodes:
47+
roleConfig:
48+
podDisruptionBudget:
49+
maxUnavailable: 10
50+
roleGroups:
51+
default:
52+
replicas: 100
53+
journalnodes:
54+
roleConfig:
55+
podDisruptionBudget:
56+
enabled: false
57+
roleGroups:
58+
default:
59+
replicas: 3
60+
----
61+
62+
== Using you own custom PDBs
63+
In case you are not satisfied with the PDBs that are written by the operators, you can deploy your own.
64+
65+
WARNING: In case you write custom PDBs, it is your responsibility to take care of the availability of the products
66+
67+
IMPORTANT: It is important to disable the PDBs created by the Stackable operators as described above before creating your own PDBs, as this is a https://github.com/kubernetes/kubernetes/issues/75957[limitation of Kubernetes].
68+
69+
*After disabling the Stackable PDBs*, you can deploy you own PDB such as
70+
71+
[source,yaml]
72+
----
73+
apiVersion: policy/v1
74+
kind: PodDisruptionBudget
75+
metadata:
76+
name: hdfs-journalnode-and-namenode
77+
spec:
78+
maxUnavailable: 1
79+
selector:
80+
matchLabels:
81+
app.kubernetes.io/name: hdfs
82+
app.kubernetes.io/instance: hdfs
83+
matchExpressions:
84+
- key: app.kubernetes.io/component
85+
operator: In
86+
values:
87+
- journalnode
88+
- namenode
89+
----
90+
91+
This PDB allows only one Pod out of all the Namenodes and Journalnodes to be down at one time.
92+
93+
== Details
94+
Have a look at <<< TODO: link ADR on Pod Disruptions once merged >>> for the implementation details.

modules/concepts/pages/pod_placement.adoc renamed to modules/concepts/pages/operations/pod_placement.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
= Pod Placement
1+
= Pod placement
2+
:page-aliases: ../pod_placement.adoc
23

34
Several operators of the Stackable Data Platform permit the configuration of pod affinity as described in the Kubernetes https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/[documentation]. If no affinity is defined in the product's custom resource, the operators apply reasonable defaults that make use of the `preferred_during_scheduling_ignored_during_execution` property. Refer to the operator documentation for details.
45

modules/concepts/pages/overrides.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ WARNING: Overriding certain configuration properties can lead to faulty clusters
1010

1111
The cluster definitions also supports overriding configuration aspects, either per xref:roles-and-role-groups.adoc[role or per role group], where the more specific override (role group) has precedence over the less specific one (role).
1212

13+
[#config-overrides]
1314
== Config overrides
1415

1516
For a xref:roles-and-role-groups.adoc[role or role group], at the same level of `config`, you can specify `configOverrides` for any of the configuration files the product uses.
@@ -44,6 +45,7 @@ The properties will be formatted and escaped correctly into the file format used
4445
You can also set the property to an empty string (`my.property: ""`), which effectively disables the property the operator would write out normally.
4546
In case of a `.properties` file, this will show up as `my.property=` in the `.properties` file.
4647

48+
[#env-overrides]
4749
== Environment variable overrides
4850

4951
For a xref:roles-and-role-groups.adoc[role or role group], at the same level of `config`, you can specify `envOverrides` for any env variable
@@ -75,6 +77,7 @@ spec:
7577
You can set any environment variable, but every specific product does support a different set of environment variables.
7678
All override property values must be strings.
7779

80+
[#pod-overrides]
7881
== Pod overrides
7982

8083
For a xref:roles-and-role-groups.adoc[role or role group], at the same level of `config`, you can specify `podOverrides` for any of the attributes you can configure on a Pod.

modules/concepts/pages/product_image_selection.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,7 @@ When deriving images from official Stackable images this will mean updating the
129129
* It is not possible to update the Stackable Platform to a new version without changing the deployed cluster definitions when using custom images.
130130
The recommended process here is:
131131

132-
** Tag clusters as "do not reconcile" (see xref:cluster_operations.adoc[])
132+
** Tag clusters as "do not reconcile" (see xref:operations/cluster_operations.adoc[])
133133
** Update Stackable plattform
134134
** Change custom images in cluster specifications
135135
** Remove "do not reconcile flag"

0 commit comments

Comments
 (0)