1
- = ADR028: Discovery revision
1
+ = ADR028: Service Discovery (Revision)
2
2
Sebastian Bernauer <sebastian .bernauer@stackable.tech >
3
3
v0.1, 2023-03-30
4
4
:status: draft
@@ -68,9 +68,15 @@ We have some common use-cases that we need to express via the discovery mechanis
68
68
69
69
== Considered Options
70
70
71
- === [1] Discovery Object: Use ConfigMap
71
+ === Discovery Object
72
72
73
- Use a ConfigMap:
73
+ This Discovery Object is written by the Product Operator. It describes how the product can be accessed by other
74
+ connecting entities (clients). These clients can be inside the same Kubernetes cluster, in an external Kubernetes
75
+ cluster or deployed using a different approach like running inside a VM or on bare metal hardware. The discovery object
76
+ is used both internally by the SDP and by other actors (for example users). Each Discovery Object is created in the same
77
+ namespace the product runs in. This means cross-namespace discovery needs special attention.
78
+
79
+ ==== Discovery Object: Option 1 - ConfigMap
74
80
75
81
[source,yaml]
76
82
----
@@ -94,22 +100,20 @@ spec:
94
100
authenticationSecretClass: client-kerberos
95
101
----
96
102
97
- ==== Pros
103
+ ===== Pros
98
104
99
105
* Easier to use for consuming applications outside of Stackable, as they can simply mount the CM or download it to a
100
106
file. On the other hand when we start to put in yaml objects that need to be parsed we would loose the benefit.
101
107
* Single API call to retrieve all running products
102
108
103
- ==== Cons
109
+ ===== Cons
104
110
105
111
* No complex structure such as enums (we can mitigate this by sticking in custom yaml into the CM). Users don't have any
106
112
form of validation when creating their own discovery CM e.g. pointing to their existing HDFS.
107
113
* Cannot have two products with the same name, as the discovery CM name clashes. One solution could be to prefix the
108
114
product name (e.g. trino-simple), This can impose other problems such as too long CM names.
109
115
110
- === [1] Discovery Object: Use dedicated CRD object for every product
111
-
112
- Or use a dedicated HdfsClusterDiscovery crd:
116
+ ==== Discovery Object: Option 2 - Dedicated CRD Object for every Product
113
117
114
118
[source,yaml]
115
119
----
@@ -133,30 +137,28 @@ spec:
133
137
secretClass: client-tls # Use this SecretClass to obtain a keytab
134
138
----
135
139
136
- ==== Pros
140
+ ===== Pros
137
141
138
142
* Validation by using e.g. complex enums
139
143
* Commons structure can be shared between all operators, such as `Listener` endpoints or tls server certificate
140
144
information
141
145
142
- ==== Cons
146
+ ===== Cons
143
147
144
- * Operator A needs to compile against operator b to have access to it's discovery struct. An alternative would be to put
148
+ * Operator A needs to compile against operator B to have access to it's discovery struct. An alternative would be to put
145
149
the Discovery CRDs in operator-rs.
146
150
* Operator versioning hell. On the other hand we have the same problem with ConfigMaps, as e.g. a newly introduced key
147
151
is missing because of an older hdfs operator version.
148
152
* Dependant Pods (such as hbase on hdfs) can not simply mount a CM containing the hdfs-site and core-site. Instead the
149
153
hbase-operator needs to read the HdfsClusterDiscovery, copy the hdfs-site and core-site into a CM and than mount that
150
154
into the hbase Pods. This can be solved by the HdfsClusterDiscovery to point to a CM that contains hdfs-site and
151
- core-site xmls .
155
+ core-site XML files .
152
156
* Multiple API calls need to retrieve all running Stackable service (in stackablectl or cockpit). This would be a single
153
157
API call in case of discovery CM or a shared CRD for all product discoveries.
154
158
* Side-Note: `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are
155
159
external systems, where we don't know anything about.
156
160
157
- === [1] Discovery Object: Use dedicated CRD object for every product - in combination with ConfigMap
158
-
159
- Or use a dedicated HdfsClusterDiscovery crd:
161
+ ==== Discovery Object: Option 3 - Dedicated CRD Object + ConfigMap for every Product
160
162
161
163
[source,yaml]
162
164
----
@@ -210,15 +212,13 @@ spec:
210
212
# No CM needed
211
213
----
212
214
213
- ==== Pros
215
+ ===== Pros
214
216
215
217
* Fixes mount problem from `Discovery Object: Use dedicated CRD object for every product`
216
218
217
- ==== Cons
218
-
219
- === [1] Discovery Object: Use dedicated CRD object shared between all products
219
+ ===== Cons
220
220
221
- Or use a dedicated ClusterDiscovery crd:
221
+ ==== Discovery Object: Option 4 - Dedicated CRD Object shared between all Products
222
222
223
223
[source,yaml]
224
224
----
@@ -245,13 +245,13 @@ spec:
245
245
# ...
246
246
----
247
247
248
- ==== Pros
248
+ ===== Pros
249
249
250
250
* Only one struct in operator-rs => No cross-operator dependencies.
251
251
* Single API call to retrieve all stackable products. Question is if this really helps a lot, as callers probably also
252
252
are interested in the status of the product, which needs further API calls (irrelevant - see Cons).
253
253
254
- ==== Cons
254
+ ===== Cons
255
255
256
256
* All product discoveries are versioned together. E.g. a new mandatory field for hdfs requires all operators to bump the
257
257
Discovery CRD to `v2`. We hope that this does not happen too often.
@@ -260,25 +260,30 @@ spec:
260
260
systems, where we don't know anything about. So in case we want to introduce a `Stacklet` object listing anyway, so
261
261
the `Pro` regarding the API calls is irrelevant.
262
262
263
- === [1] Discovery Object: Write the discovery to Product CR status
263
+ ==== Discovery Object: Option 5 - Write Discovery to Product CR Status
264
264
265
265
Instead of writing discovery information to dedicated objects - such as CM or custom CR - we "simply" write the
266
266
discovery information to the status of the Cluster CR.
267
267
268
- ==== Pros
268
+ ===== Pros
269
269
270
- ==== Cons
270
+ * None currently
271
+
272
+ ===== Cons
271
273
272
274
* It does not enable users to bring their own product and talk to it from Stackable, e.g. a user-provided HDFS.
273
275
* It does not allow things such as a ZNode for Zookeeper as we either use the Zookeeper CR for discovery or we use a
274
276
ZNode but than can't use a Zookeeper CR. Currently we have the freedom of either connection to a Zookeeper root dir or
275
277
a ZNode transparently.
276
278
277
- === [2] TLS: Discovery config contains SecretClass
279
+ '''
278
280
279
- The discovery includes the SecretClass used to obtain the ca.crt used to validate the *server* certificate
281
+ === TLS
282
+
283
+ ==== TLS: Option 1 - Discovery Config contains SecretClass
284
+
285
+ The discovery includes the SecretClass used to obtain the ca.crt used to validate the *server* certificate.
280
286
281
- Trino discovery:
282
287
[source,yaml]
283
288
----
284
289
apiVersion: trino.stackable.tech/v1alpha1
@@ -305,18 +310,20 @@ backends: # Don't look at the Superset CRD structure, we are only interested in
305
310
discovery: my-trino
306
311
----
307
312
308
- ==== Pros
313
+ ===== Pros
309
314
310
- ==== Cons
315
+ * Currently none
311
316
312
- === [2] TLS: Client needs to specify SecretClass
313
- ---
317
+ ===== Cons
318
+
319
+ * Currently none
320
+
321
+ ==== TLS: Option 2 - Client specifies SecretClass
314
322
315
323
The discovery does *not* include the SecretClass used to obtain the *server* certificate. Instead the client must
316
324
specify which SecretClass should be used to verify the *server* certificate. For usability reasons it can be omitted and
317
325
defaults to the SecretClass the client uses for itself.
318
326
319
- Trino discovery:
320
327
[source,yaml]
321
328
----
322
329
apiVersion: trino.stackable.tech/v1alpha1
@@ -343,22 +350,21 @@ backends: # Don't look at the Superset CRD structure, we are only interested in
343
350
tlsSecretClass: my-second-pki
344
351
----
345
352
346
- ==== Pros
353
+ ===== Pros
347
354
348
355
* Operator does not need to read/look at the DiscoveryConfig (as we can statically set up the secret-op tls secretClass
349
356
volumes rather than retrieving them from the DiscoveryConfig).
350
357
* Some clients only support a single pki, in that case we could not give the ability to overwrite the secretClass coming
351
358
from the product itself.
352
359
353
- ==== Cons
360
+ ===== Cons
354
361
355
362
* The client has to know what pki/secretClass the server is using.
356
363
* Superset TrinoConnection could not only say "Connect this Superset and this Trino", but need to say "using this ca.crt
357
364
to validate the Trino server"
358
365
359
- === [2] TLS: Include caCert in Discovery config
366
+ ==== TLS: Option 3 - Include caCert in Discovery Config
360
367
361
- Trino discovery:
362
368
[source,yaml]
363
369
----
364
370
metadata:
@@ -376,23 +382,21 @@ endpoint:
376
382
=== END CERTIFICATE ===
377
383
----
378
384
379
- ==== Pros
385
+ ===== Pros
380
386
381
387
* Easier for external clients to use as they don't need to know the concept of SecretClasses and don't even need to run
382
388
withing k8s.
383
389
* The client has to *not* know what pki/secretClass the server is using.
384
390
385
- ==== Cons
391
+ ===== Cons
386
392
387
- * BIG QUESTION: How should the product operator get the ca cert from the SecretClass it uses to get the *server* cert
393
+ * BIG QUESTION: How should the product operator get the CA cert from the SecretClass it uses to get the *server* cert
388
394
from?
389
395
** The secret-op could e.g. offer an HTTP api to fetch the ca.crt of a given SecretClass or e.g. write the ca.crt into
390
396
the status of a SecretClass
391
397
392
398
393
- === [2] TLS: Include SecretClass in discovery, user can override it
394
-
395
- Trino discovery:
399
+ ==== TLS: Option 4 - Include SecretClass in Discovery (User can override it)
396
400
397
401
[source,yaml]
398
402
----
@@ -422,17 +426,20 @@ backends: # Don't look at the Superset CRD structure, we are only interested in
422
426
tlsSecretClass: my-second-pki
423
427
----
424
428
425
- ==== Pros
429
+ ===== Pros
426
430
427
431
* Compromise with all usability and flexibility
428
432
429
- ==== Cons
433
+ ===== Cons
430
434
431
435
* Less secure by default
432
436
433
- === [3] Authentication: Add AuthenticationClass to Discovery Config
437
+ '''
438
+
439
+ === Authentication
440
+
441
+ ==== Authentication: Option 1 - Add AuthenticationClass to Discovery Config
434
442
435
- Trino discovery:
436
443
[source,yaml]
437
444
----
438
445
metadata:
@@ -441,23 +448,23 @@ authentication:
441
448
authenticationClass: my-class
442
449
----
443
450
444
- ==== Pros
451
+ ===== Pros
452
+
445
453
* IMPORTANT: This is the only thing the server can know (how he is verifying client identities). He can not recommend an
446
454
SecretClass used to obtain the client credentials. E.g. he uses an LDAP AuthenticationClass, there is no way it can
447
455
now what SecretClass provides credentials accepted by LDAP. (Most cases it will be a user logging into a WebUI and the
448
456
LDAP credentials of the user are not even stored anywhere but just remembered by the user)
449
457
450
- ==== Cons
458
+ ===== Cons
451
459
452
460
* Operator has to read the AuthenticationClass to determine its type (pw/tls/keytab) and set up the needed volumes and
453
461
commands.
454
462
// * The AuthenticationClass is meant to describe "how should a server verify connecting clients" and re-purpose it to
455
463
// mean "how a client should authenticate itself". Image a user creates a Secret `trino-users` with *only* a ca.crt
456
464
// and a SecretClass `trino-users` on top. The connecting client than has no way of knowing how to get a client cert.
457
465
458
- === [3] Authentication: Add SecretClass to Discovery Config
466
+ ==== Authentication: Option 2 - Add SecretClass to Discovery Config
459
467
460
- Trino discovery:
461
468
[source,yaml]
462
469
----
463
470
metadata:
@@ -466,13 +473,17 @@ authentication:
466
473
secretClass: client-tls # Use this SecretClass to obtain your credentials (regardless of type of SecretClass)
467
474
----
468
475
469
- ==== Cons
476
+ ===== Pros
477
+
478
+ * Currently none
479
+
480
+ ===== Cons
470
481
471
482
* Operator has to read the SecretClass to determine its type (pw/tls/keytab) and set up the needed volumes and commands.
472
483
* Image then SecretClass is of type `k8sSearch`. The connection client (e.g. controlled via superset-operator) than has
473
484
no idea if he should expect a tls.crd or a keytab when mounting the SecretClass.
474
485
475
- === [3] Authentication: Add needed details
486
+ ==== Authentication: Option 3 - Add needed Details
476
487
477
488
Trino discovery:
478
489
[source,yaml]
@@ -488,27 +499,33 @@ authentication:
488
499
secretClass: client-kerberos # Use this SecretClass to obtain a keytab
489
500
----
490
501
491
- ==== Pros
502
+ ===== Pros
492
503
493
504
* Operator has *not* to read the SecretClass to determine its type (pw/tls/keytab), as the type is already encoded in
494
505
the Discovery config.
495
506
496
- ==== Cons
507
+ ===== Cons
497
508
498
- === [3] Authentication: Don't add information how to authenticate
509
+ * Currently none
510
+
511
+ ==== Authentication: Option 4 - Provide no Authentication Information
499
512
500
513
Trino discovery does not provide any information on how to authenticate
501
514
502
- ==== Cons
515
+ ===== Pros
516
+
517
+ * Currently none
518
+
519
+ ===== Cons
503
520
504
521
* Not viable, as users need to know how to connect, and are not expected to try 50 different auth methods. We need to
505
522
give them a AuthenticationClass, that says them e.g. what LDAP or PKI is used.
506
523
507
524
== Decision Outcome
508
525
509
- [1] Discovery Object: `Discovery Object: Use dedicated CRD object for every product - in combination with ConfigMap`
510
- [2] Server tls cert: TODO
511
- [3] Authentication: `Authentication: Add AuthenticationClass to Discovery Config`
526
+ * * Discovery Object:* Option 3
527
+ * * Server TLS:* TBD, but leaning towards Option 2
528
+ * * Authentication:* Option 1
512
529
513
530
=== Appendix A
514
531
0 commit comments