Skip to content

Commit 4d5de77

Browse files
committed
WIP
1 parent 713b5ea commit 4d5de77

File tree

1 file changed

+296
-10
lines changed

1 file changed

+296
-10
lines changed

modules/contributor/pages/adr/ADR030-reduce-pod-disruptions.adoc

Lines changed: 296 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,23 @@ Downtime of products is always bad.
1414
Kubernetes has a a concepts called https://kubernetes.io/docs/tasks/run-application/configure-pdb/[PodDisruptionBudget] (PDB) to prevent this.
1515
We want to use this functionary to try to reduce the downtime to an absolute minimum.
1616

17-
*Requirements:*
17+
== Decision Drivers
18+
19+
* Ease of use and comprehensibility for the user
20+
** Principle of least surprise
21+
* Easy implementation (far less important)
22+
* Extendable design, so that we can later non-breaking add new functionality, such as giving the chance to configure PDBs on roleGroup level as well.
23+
24+
== Example use-cases
25+
26+
1. As a user I want an HDFS and it (or parts) should not be disturbed by planned pod evictions.
27+
2. As a user I want to configure maxUnavailable on the role (e.g. datanode) across all rolegroups (e.g. dfs replicas 3 and only a single datanode is allowed to go down - regardless of the number of rolegroups), so that no datanode is a single point of failure.
28+
3. As a user I want to configure maxUnavailable on the rolegroups individually, as I e.g. have some fast datanodes using SSDs and some slow datanodes using HDDs. I want to have always X number of fast datanodes online for performance reasons.
29+
4. As a user I want a Superset/NiFi/Kafka and they (or parts) should not be disturbed by planned pod evictions.
30+
31+
Most of the users probably either don't know what PDBs are or are fine with the default values our operators deploy based upon our knowledge of the products.
32+
33+
== Requirements
1834

1935
1. We must deploy a PDB alongside all the product StatefulSets (and Deployments in the future) to restrict pod disruptions.
2036
2. Also users need the ability to override the numbers we default to, as they need to make a tradeoff between availability and rollout times e.g. in rolling redeployment. Context: I have operated Trino clusters that could take more than 6 hours to rolling redeploy, as the graceful shutdown of Trino workers takes a considerable amount of time - depended on the queries getting executed.
@@ -34,8 +50,19 @@ Because of the mentioned constraints we have the following implications:
3450
4. Users must be able to disable our PDB creation in the case they want to define their own, as otherwise the Pods would have multiple PDBs, which is not supported.
3551
5. We try to have a PDB per role, as this makes things much easier than e.g. saying "out of the namenodes and journalnodes only one can be down". Otherwise we can not make it "simply" configurable on the role.
3652

37-
Taking the implications into account we end up with the following CRD structure:
53+
== Question 1: Do we want to support configuring PDBs on role or role and rolegroup?
54+
55+
=== Option 1: Configurable on role level
3856

57+
=== Option 2: Configurable on role + rolegroup level
58+
59+
Cons:
60+
61+
* It's really really complicated for the user and the implementation.
62+
63+
.Explanation
64+
[%collapsible]
65+
====
3966
[source,yaml]
4067
----
4168
apiVersion: hdfs.stackable.tech/v1alpha1
@@ -48,8 +75,80 @@ spec:
4875
clusterConfig:
4976
zookeeperConfigMapName: simple-hdfs-znode
5077
nameNodes:
51-
# optional, only supported on role, *not* on rolegroup
52-
pdb:
78+
config:
79+
podDisruptionBudget:
80+
enabled: true
81+
maxUnavailable: 2
82+
roleGroups:
83+
hdd:
84+
replicas: 16
85+
config:
86+
podDisruptionBudget:
87+
maxUnavailable: 4
88+
ssd:
89+
replicas: 8
90+
config:
91+
podDisruptionBudget:
92+
enabled: false
93+
in-memory:
94+
replicas: 4
95+
----
96+
97+
would end up with something like
98+
99+
[source,yaml]
100+
----
101+
apiVersion: policy/v1
102+
kind: PodDisruptionBudget
103+
metadata:
104+
name: simple-hdfs-datanodes-hdds
105+
spec:
106+
maxUnavailable: 4
107+
selector:
108+
matchLabels:
109+
app.kubernetes.io/name: hdfs
110+
app.kubernetes.io/instance: simple-hdfs
111+
app.kubernetes.io/component: datanode
112+
app.kubernetes.io/rolegroup: hdd
113+
---
114+
apiVersion: policy/v1
115+
kind: PodDisruptionBudget
116+
metadata:
117+
name: simple-hdfs-datanodes-not-hdds
118+
spec:
119+
maxUnavailable: 2
120+
selector:
121+
matchLabels:
122+
app.kubernetes.io/name: hdfs
123+
app.kubernetes.io/instance: simple-hdfs
124+
app.kubernetes.io/component: datanode
125+
matchExpressions:
126+
- key: app.kubernetes.io/rolegroup
127+
operator: NotIn
128+
values:
129+
- hdd
130+
- key: app.kubernetes.io/rolegroup
131+
operator: NotIn
132+
values:
133+
- in-memory
134+
----
135+
====
136+
137+
Chosen option: *Option 1: Configurable on role level*
138+
139+
== Question 2: How dows the CRD structure look like?
140+
141+
=== Option 1
142+
143+
[source,yaml]
144+
----
145+
apiVersion: hdfs.stackable.tech/v1alpha1
146+
kind: HdfsCluster
147+
metadata:
148+
name: simple-hdfs
149+
spec:
150+
nameNodes:
151+
podDisruptionBudget: # optional
53152
enabled: true # optional, defaults to true
54153
maxUnavailable: 1 # optional, defaults to our "smart" calculation
55154
roleGroups:
@@ -59,15 +158,202 @@ spec:
59158
# use pdb defaults
60159
roleGroups:
61160
default:
62-
replicas: 1
63-
journalNodes:
64-
# use pdb defaults
161+
replicas: 2
162+
----
163+
164+
==== Pros
165+
166+
* Everything below `config` can be merged, everything below `clusterConfig` has applied to the whole cluster (no exceptions)
167+
168+
==== Cons
169+
170+
* Bloating `spec.namenodes`
171+
172+
=== Option 2
173+
174+
[source,yaml]
175+
----
176+
spec:
177+
nameNodes:
178+
config: # <<<
179+
podDisruptionBudget:
180+
enabled: true
181+
maxUnavailable: 1
65182
roleGroups:
66183
default:
67-
replicas: 10
184+
replicas: 2
185+
config: {}
186+
# no such field as podDisruptionBudget
68187
----
69188

70-
and end up with the following PDBs when the default values are used:
189+
==== Pros
190+
191+
* Everything configurable is below `config` - some attributes of it can be merged - or `clusterConfig`.
192+
193+
==== Cons
194+
195+
* `spec.nameNodes.config` is *not* similar to `spec.nameNodes.roleGroups.default.config` => Confusing to the user
196+
197+
=== Option 3
198+
199+
[source,yaml]
200+
----
201+
spec:
202+
nameNodes:
203+
roleConfig: # <<<
204+
podDisruptionBudget:
205+
enabled: true
206+
maxUnavailable: 1
207+
roleGroups:
208+
default:
209+
replicas: 2
210+
----
211+
212+
==== Pros
213+
214+
* Not bloating `spec.namenodes`
215+
216+
==== Cons
217+
218+
* Yet another "config" (config, clusterConfig and now roleConfig as well)
219+
** That's kind of the way the real world is: There are some thing you can configure on cluster level (e.g. ldap), role level (pdbs) and role group level (resources). This models this the closest.
220+
221+
=== Option 4
222+
223+
[source,yaml]
224+
----
225+
spec:
226+
dataNodes:
227+
config:
228+
podDisruptionBudget:
229+
maxUnavailable: 2
230+
roleGroups:
231+
hdd:
232+
replicas: 16
233+
ssd:
234+
replicas: 8
235+
in-memory:
236+
replicas: 4
237+
----
238+
239+
would end up with
240+
241+
[source,yaml]
242+
----
243+
apiVersion: policy/v1
244+
kind: PodDisruptionBudget
245+
metadata:
246+
name: simple-hdfs-datanodes-hdds
247+
spec:
248+
maxUnavailable: 2
249+
selector:
250+
matchLabels:
251+
app.kubernetes.io/name: hdfs
252+
app.kubernetes.io/instance: simple-hdfs
253+
app.kubernetes.io/component: datanode
254+
app.kubernetes.io/rolegroup: hdd
255+
---
256+
apiVersion: policy/v1
257+
kind: PodDisruptionBudget
258+
metadata:
259+
name: simple-hdfs-datanodes-hdds
260+
spec:
261+
maxUnavailable: 2
262+
selector:
263+
matchLabels:
264+
app.kubernetes.io/name: hdfs
265+
app.kubernetes.io/instance: simple-hdfs
266+
app.kubernetes.io/component: datanode
267+
app.kubernetes.io/rolegroup: ssd
268+
---
269+
apiVersion: policy/v1
270+
kind: PodDisruptionBudget
271+
metadata:
272+
name: simple-hdfs-datanodes-hdds
273+
spec:
274+
maxUnavailable: 2
275+
selector:
276+
matchLabels:
277+
app.kubernetes.io/name: hdfs
278+
app.kubernetes.io/instance: simple-hdfs
279+
app.kubernetes.io/component: datanode
280+
app.kubernetes.io/rolegroup: in-memory
281+
----
282+
283+
[source,yaml]
284+
----
285+
spec:
286+
nameNodes:
287+
config:
288+
podDisruptionBudget:
289+
enabled: true
290+
maxUnavailable: 2
291+
roleGroups:
292+
hdd:
293+
replicas: 16
294+
config:
295+
podDisruptionBudget:
296+
maxUnavailable: 4
297+
ssd:
298+
replicas: 8
299+
config:
300+
podDisruptionBudget:
301+
enabled: false
302+
in-memory:
303+
replicas: 4
304+
----
305+
306+
would end up with
307+
308+
[source,yaml]
309+
----
310+
apiVersion: policy/v1
311+
kind: PodDisruptionBudget
312+
metadata:
313+
name: simple-hdfs-datanodes-hdds
314+
spec:
315+
maxUnavailable: 4
316+
selector:
317+
matchLabels:
318+
app.kubernetes.io/name: hdfs
319+
app.kubernetes.io/instance: simple-hdfs
320+
app.kubernetes.io/component: datanode
321+
app.kubernetes.io/rolegroup: hdd
322+
---
323+
apiVersion: policy/v1
324+
kind: PodDisruptionBudget
325+
metadata:
326+
name: simple-hdfs-datanodes-hdds
327+
spec:
328+
maxUnavailable: 2
329+
selector:
330+
matchLabels:
331+
app.kubernetes.io/name: hdfs
332+
app.kubernetes.io/instance: simple-hdfs
333+
app.kubernetes.io/component: datanode
334+
app.kubernetes.io/rolegroup: in-memory
335+
----
336+
337+
338+
339+
==== Pros
340+
341+
*
342+
343+
==== Cons
344+
345+
*
346+
347+
348+
349+
350+
351+
352+
353+
354+
355+
356+
We end up with the following PDBs when the default values are used:
71357

72358
[source,yaml]
73359
----
@@ -100,7 +386,7 @@ kind: PodDisruptionBudget
100386
metadata:
101387
name: simple-hdfs-datanodes
102388
spec:
103-
maxUnavailable: 2
389+
maxUnavailable: 2 # assuming dfs replication 3
104390
selector:
105391
matchLabels:
106392
app.kubernetes.io/name: hdfs

0 commit comments

Comments
 (0)