Add concepts guide on graceful shutdown (#468)

sbernauer · adwk67 · web-flow · commit e7be753aff9a · 2023-10-11T10:27:09.000Z
* Add concepts guide on graceful shutdown

* Apply suggestions from code review

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* Add k8s requirements

* Apply suggestions from code review

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;

* fix nav

---------

Co-authored-by: Andrew Kenworthy &lt;andrew.kenworthy@stackable.de&gt;
diff --git a/modules/concepts/nav.adoc b/modules/concepts/nav.adoc
@@ -10,10 +10,10 @@
 ** xref:resources.adoc[]
 ** xref:s3.adoc[]
 ** xref:tls_server_verification.adoc[]
-** xref:pod_placement.adoc[]
 ** xref:overrides.adoc[]
 ** xref:duration.adoc[]
 ** xref:operations/index.adoc[]
 *** xref:operations/cluster_operations.adoc[]
-*** xref:operations/pod_placement.adoc[]
 *** xref:operations/pod_disruptions.adoc[]
+*** xref:operations/pod_placement.adoc[]
+*** xref:operations/graceful_shutdown.adoc[]
diff --git a/modules/concepts/pages/operations/graceful_shutdown.adoc b/modules/concepts/pages/operations/graceful_shutdown.adoc
@@ -0,0 +1,35 @@
+= Graceful shutdown
+
+The article https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace[Kubernetes best practices: terminating with grace] describes how a graceful shutdown works in Kubernetes.
+
+Our operators add the needed shutdown mechanism for their products that support graceful shutdown.
+
+They also configure a sensible amount of time Pods are granted to properly shut down without disrupting the availability of the product.
+If you are not satisfied with the default values, you can set the graceful shutdown timeout as follows:
+
+[source,yaml]
+----
+spec:
+  workers:
+    config:
+      gracefulShutdownTimeout: 1h # Set it for all worker roleGroups
+    roleGroups:
+      normal: # Will use 1h from the worker role config
+        replicas: 1
+      long: # Will use 6h from the roleGroup config below
+        replicas: 1
+        config:
+          gracefulShutdownTimeout: 6h # Set it only for this specific roleGroup
+----
+
+The individual default timeouts are documented in the specific operators at the `Operations -> Graceful shutdown` usage-guide.
+
+== Kubernetes cluster requirements
+Pods need to have the ability to take as long as they need to gracefully shut down without getting killed.
+
+Imagine the situation that you set the graceful shutdown period to 24 hours.
+In the case of e.g. an on-premise Kubernetes cluster the Kubernetes infrastructure team may want to drain the Kubernetes node so that they can do regular maintenance, such as rebooting the node.
+They will have some upper limit on how long they will wait for Pods on the Node to terminate before they reboot the Kubernetes node, regardless of any Pods that are still running.
+
+When setting up a production cluster, you need to check with your Kubernetes administrator (or cloud provider) what time period your Pods have to terminate gracefully.
+It is not sufficient to have a look at the `spec.terminationGracePeriodSeconds` and come to the conclusion that the Pods have e.g. 24 hours to gracefully shut down, as e.g. an administrator can reboot the Kubernetes node before the time period is reached.
diff --git a/modules/concepts/pages/operations/index.adoc b/modules/concepts/pages/operations/index.adoc
@@ -17,7 +17,7 @@ Make sure to go through the following checklist to achieve the maximum level of
    Many HA capable products offer a way to gracefully shut down the service running within the Pod.
    The flow is as follows: Kubernetes wants to shut down the Pod and calls a hook into the Pod, which in turn interacts with the product, signaling it to gracefully shut down.
    The final deletion of the Pod is then blocked until the product has successfully migrated running workloads away from the Pod that is to be shut down.
-   Details covering the graceful shutdown mechanism are described in the actual operator documentation.
+   Details covering the graceful shutdown mechanism are described in xref:operations/graceful_shutdown.adoc[] as well as the actual operator documentation.
 +
 WARNING: Graceful shutdown is not implemented for all products yet. Please check the documentation specific to the product operator to see if it is supported (such as e.g. xref:trino:usage-guide/operations/graceful-shutdown.adoc[the documentation for Trino].
 

Original file line number	Diff line number	Diff line change
`@@ -17,7 +17,7 @@ Make sure to go through the following checklist to achieve the maximum level of`
`17`	`17`	`Many HA capable products offer a way to gracefully shut down the service running within the Pod.`
`18`	`18`	`The flow is as follows: Kubernetes wants to shut down the Pod and calls a hook into the Pod, which in turn interacts with the product, signaling it to gracefully shut down.`
`19`	`19`	`The final deletion of the Pod is then blocked until the product has successfully migrated running workloads away from the Pod that is to be shut down.`
`20`		`- Details covering the graceful shutdown mechanism are described in the actual operator documentation.`
	`20`	`+ Details covering the graceful shutdown mechanism are described in xref:operations/graceful_shutdown.adoc[] as well as the actual operator documentation.`
`21`	`21`	`+`
`22`	`22`	`WARNING: Graceful shutdown is not implemented for all products yet. Please check the documentation specific to the product operator to see if it is supported (such as e.g. xref:trino:usage-guide/operations/graceful-shutdown.adoc[the documentation for Trino].`
`23`	`23`