Updated text (still WIP)

fhennig · fhennig · commit 4f7013946bde · 2023-08-31T12:04:37.000+02:00
diff --git a/modules/concepts/pages/service-exposition.adoc b/modules/concepts/pages/service-exposition.adoc
@@ -1,16 +1,41 @@
 = Service exposition
 
-For some deployed products it is sufficient to be only accessible within the Kubernetes cluster, while others need to be accessible from outside the Kubernetes cluster.
-This can e.g. be from your internal corporate network when running on bare metal, your internal network in your cloud provider or the Internet.
+Most data products expose a service, either to be used by other tools running in the same cluster, or an API or web interface to be used by a user outside of the cluster. For example, ZooKeeper is a service that is required as a dependency by other tools, and needs to expose its service in the cluster only, while Superset is an analysis tool used by end user which need to access it from the outside, which can either mean the local network of your company, or exposed on the internet if the Kubernetes cluster is running in the cloud.
+This page gives an overview over the different options for service exposition, when to chose which option and how these options are configured.
 
-As of the release 23.4, the Stackable Operators create Kubernetes Service objects to expose the deployed product.
-For security reasons, the Services default to the `ClusterIP` type in order to avoid exposing anything to the public.
-You can specify the type within the custom resource field `spec.clusterConfig.listenerClass` by setting it to either:
+== Service exposition options
 
-* `cluster-internal` => Use `ClusterIP` (default)
-* `external-unstable` => Use `NodePort`
-* `external-stable` => Use `LoadBalancer`
+The service offered by a data product is the utility it is used for, but https://kubernetes.io/docs/concepts/services-networking/service/[Service] also means the Kubernetes resource. The Stackable Data Platform supports three types of Service:
 
-Please note that as of the release 23.4 not all operators support all the mentioned `Service` types.
+* ClusterIP
+* NodePort
+* LoadBalancer
 
-In a future release, the `ListenerClass` provided by the xref:listener-operator:index.adoc[listener-operator] will be supported to make things more flexible.
+Every data product cluster supports configuring this through the custom resource field `spec.clusterConfig.listenerClass`. There are three ListenerClasses, named after the goal for which they are used (more on this in the <<when-to-choose-which-option, next section>>):
+
+* `cluster-internal` => Use ClusterIP (default)
+* `external-unstable` => Use NodePort
+* `external-stable` => Use LoadBalancer
+
+The `cluster-internal` class only exposes a Service inside the cluster by using a ClusterIP Service. This setting is the default and it was chosen for security reasons: By default, no Service is exposed to the public.
+
+NOTE: Not all Operators support all classes. Consult the Operator specific documentation to find out about the supported service types.
+
+== [[when-to-choose-which-option]]When to choose which option
+
+There are three options, one for internal traffic and two for external access, where internal and external refer to the Kubernetes cluster. Internal means inside of the Kuberenetes cluster, and external means access from outside of it.
+
+=== Internal
+
+The `cluster-internal` setting is the default class and, the Service is only exposed inside the Kubernetes cluster. This is useful for internal dependencies such as xref:zookeeper:index.adoc[Apache ZooKeeper] or the xref:hive:index.adoc[Apache Hive metastore] or a xref:kafka:index.adoc[Apache Kafka] cluster used for internal data flow. All requests are coming from inside the Kubernetes cluster.
+
+=== External
+
+External access is needed when a tool needs to be accessed from _outside_ of the Kubernetes cluster. This is necessary for all tools that are used by a user, such as the data visualization tool xref:superset:index.adoc[Apache Superset]. Some tools can expose APIs for data ingestion like xref:kafka:index.adoc[Apache Kafka] or xref:nifi:index.adoc[Apache NiFi]. If data needs to be ingested from outside of the cluster, one of the external listener classes should be chosen.
+
+When to use `stable` and when to use `unstable`? The `external-unstable` setting exposes a NodePort. This means that the service will be exposed at a port on the node that the Pod is running on. (TODO Advantages?) This has the advantage that .... but it the port is not known in advance and is also not stable: After a Pod restart, the port might change. The `external-stable` setting uses a LoadBalancer. The LoadBalancer is running at a fixed adress and is therefore `stable`. But ....
+
+== Outlook
+
+These listener classes are hardcoded to expose certain Service types and do not offer any additional configuration.
+In a future release, the `ListenerClass` provided by the xref:listener-operator:index.adoc[listener-operator] will allow you to create your own listener class variants, with more granual configuration options.