diff --git a/modules/concepts/pages/operations/cluster_operations.adoc b/modules/concepts/pages/operations/cluster_operations.adoc index 027f0b19d..43d8ad904 100644 --- a/modules/concepts/pages/operations/cluster_operations.adoc +++ b/modules/concepts/pages/operations/cluster_operations.adoc @@ -6,6 +6,8 @@ Stackable operators offer different cluster operations to control the reconcilia * `reconciliationPaused` - Stop the operator from reconciling the cluster spec. The status will still be updated. * `stopped` - Stop all running pods but keep updating all deployed resources like `ConfigMaps`, `Services` and the cluster status. +If not specified, `clusterOperation.reconciliationPaused` and `clusterOperation.stopped` default to `false`. + == Example [source,yaml] @@ -15,8 +17,54 @@ include::example$cluster-operations.yaml[] <1> The `clusterOperation.reconciliationPaused` flag set to `true` stops the operator from reconciling any changes to the cluster spec. The cluster status is still updated. <2> The `clusterOperation.stopped` flag set to `true` stops all pods in the cluster. This is done by setting all deployed `StatefulSet` replicas to 0. -== Notes - -If not specified, `clusterOperation.reconciliationPaused` and `clusterOperation.stopped` default to `false`. IMPORTANT: When setting `clusterOperation.reconciliationPaused` and `clusterOperation.stopped` to true in the same step, `clusterOperation.reconciliationPaused` will take precedence. This means the cluster will stop reconciling immediately and the `stopped` field is ignored. To avoid this, the cluster should first be stopped and then paused. + +== Service Restarts + +=== Manual Restarts + +Sometimes it is necessary to restart services deployed in Kubernetes. A service restart should induce as little disruption as possible, ideally none. + +Most operators create StatefulSet objects for the products they manage and Kubernetes offers a rollout mechanism to restart them. You can use `kubectl rollout restart statefulset` to restart a StatefulSet previously created by an operator. + +To illustrate how to use the command line to restart one or more Pods, we will assume you used the Stackable HDFS Operator to deploy an HDFS stacklet called `dumbo`. + +This stacklet will consist, among other things, of three StatefulSets created for each HDFS role: `namenode`, `datanode` and `journalnode`. Let's list them: + +[source,shell] +---- +❯ kubectl get statefulset -l app.kubernetes.io/instance=dumbo +NAME READY AGE +dumbo-datanode-default 2/2 4m41s +dumbo-journalnode-default 1/1 4m41s +dumbo-namenode-default 2/2 4m41s +---- + +To restart the HDFS DataNode Pods, run: + +[source,shell] +---- +❯ kubectl rollout restart statefulset dumbo-datanode-default +statefulset.apps/dumbo-datanode-default restarted +---- + +Sometimes you want to restart all Pods of a stacklet and not just individual roles. This can be achieved in a similar manner by using labels instead of StatefulSet names. Continuing with the example above, to restart all HDFS Pods you would have to run: + +[source,shell] +---- +❯ kubectl rollout restart statefulset --selector app.kubernetes.io/instance=dumbo +---- + +To wait for all Pods to be running again: + +[source,shell] +---- +❯ kubectl rollout status statefulset --selector app.kubernetes.io/instance=dumbo +---- + +Here we used the label `app.kubernetes.io/instance=dumbo` to select all Pods that belong to a specific HDFS stacklet. This label is created by the operator and `dumbo` is the name of the HDFS stacklet as specified in the custom resource. You can add more labels to make finer grained restarts. + +== Automatic Restarts + +The Commons Operator of the Stackable Platform may restart Pods automatically, for purposes such as ensuring that TLS certificates are up-to-date. For details, see the xref:commons-operator:index.adoc[Commons Operator documentation].