You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/contributor/pages/adr/ADR028-discovery-revision.adoc
+84-44Lines changed: 84 additions & 44 deletions
Original file line number
Diff line number
Diff line change
@@ -16,28 +16,33 @@ v0.1, 2023-03-30
16
16
17
17
== Context and Problem Statement
18
18
19
-
// Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to articulate the problem in form of a question.
19
+
// Describe the context and problem statement, e.g., in free form using two to three sentences. You may want to
20
+
// articulate the problem in form of a question.
20
21
21
-
This ADR is written with specific problems in mind, but the goal is to reach a generic mechanism for the discovery of products.
22
-
The current discovery mechanism is described https://docs.stackable.tech/home/stable/concepts/service_discovery.html[in our docs] (make sure to pick the 23.1 version).
22
+
This ADR is written with specific problems in mind, but the goal is to reach a generic mechanism for the discovery of
23
+
products. The current discovery mechanism is described
24
+
https://docs.stackable.tech/home/stable/concepts/service_discovery.html[in our docs] (make sure to pick the 23.1
25
+
version).
23
26
24
-
Basically when clients connect to products managed by Stackable, they need to have certain information about how to connect to these products.
25
-
Currently we expose some of these information, but not all (e.g. which ca cert the exposed product uses).
26
-
On the other hand it could be the case that users have external services which Stackable services should use, e.g.
27
-
an non-stackable HDFS where an Stackable Trino should connect to.
27
+
Basically when clients connect to products managed by Stackable, they need to have certain information about how to
28
+
connect to these products. Currently we expose some of these information, but not all (e.g. which ca cert the exposed
29
+
product uses). On the other hand it could be the case that users have external services which Stackable services should
30
+
use, e.g. an non-stackable HDFS where an Stackable Trino should connect to.
28
31
29
-
Question 1: How do we want to make products discoverable?
30
-
Question 2: How should clients verify the identity of the server (e.g. the cert of the server)
31
-
Question 3: How should clients authenticate themselves against the server?
32
+
*Question 1:* How do we want to make products discoverable?
33
+
*Question 2:* How should clients verify the identity of the server (e.g. the cert of the server)
34
+
*Question 3:* How should clients authenticate themselves against the server?
32
35
33
36
== Decision Drivers
37
+
34
38
We have some common use-cases that we need to express via the discovery mechanism:
35
39
36
40
1. Trino cluster
37
41
* Currently we don't write any discovery CM
38
42
* We need at least
39
43
** Coordinator address (currently only single coordinator is supported)
40
-
*** Ideally split it into the following attributes, so that clients can construct the URI. (trino python client e.g. needs all of these attributes separately and I don't want to rely on parsing and extracting the URI)
44
+
*** Ideally split it into the following attributes, so that clients can construct the URI. (trino python client e.g.
45
+
needs all of these attributes separately and I don't want to rely on parsing and extracting the URI)
41
46
**** protocol (http/https)
42
47
**** host
43
48
**** port
@@ -91,15 +96,16 @@ spec:
91
96
92
97
==== Pros
93
98
94
-
* Easier to use for consuming applications outside of Stackable, as they can simply mount the CM or download it to a file.
95
-
On the other hand when we start to put in yaml objects that need to be parsed we would loose the benefit.
99
+
* Easier to use for consuming applications outside of Stackable, as they can simply mount the CM or download it to a
100
+
file. On the other hand when we start to put in yaml objects that need to be parsed we would loose the benefit.
96
101
* Single API call to retrieve all running products
97
102
98
103
==== Cons
99
104
100
-
* No complex structure such as enums (we can mitigate this by sticking in custom yaml into the CM).
101
-
Users don't have any form of validation when creating their own discovery CM e.g. pointing to their existing HDFS.
102
-
* Cannot have two products with the same name, as the discovery CM name clashes. One solution could be to prefix the product name (e.g. trino-simple), This can impose other problems such as too long CM names.
105
+
* No complex structure such as enums (we can mitigate this by sticking in custom yaml into the CM). Users don't have any
106
+
form of validation when creating their own discovery CM e.g. pointing to their existing HDFS.
107
+
* Cannot have two products with the same name, as the discovery CM name clashes. One solution could be to prefix the
108
+
product name (e.g. trino-simple), This can impose other problems such as too long CM names.
103
109
104
110
=== [1] Discovery Object: Use dedicated CRD object for every product
105
111
@@ -130,15 +136,23 @@ spec:
130
136
==== Pros
131
137
132
138
* Validation by using e.g. complex enums
133
-
* Commons structure can be shared between all operators, such as `Listener` endpoints or tls server certificate information
139
+
* Commons structure can be shared between all operators, such as `Listener` endpoints or tls server certificate
140
+
information
134
141
135
142
==== Cons
136
143
137
-
* Operator A needs to compile against operator b to have access to it's discovery struct. An alternative would be to put the Discovery CRDs in operator-rs.
138
-
* Operator versioning hell. On the other hand we have the same problem with ConfigMaps, as e.g. a newly introduced key is missing because of an older hdfs operator version.
139
-
* Dependant Pods (such as hbase on hdfs) can not simply mount a CM containing the hdfs-site and core-site. Instead the hbase-operator needs to read the HdfsClusterDiscovery, copy the hdfs-site and core-site into a CM and than mount that into the hbase Pods. This can be solved by the HdfsClusterDiscovery to point to a CM that contains hdfs-site and core-site xmls.
140
-
* Multiple API calls need to retrieve all running Stackable service (in stackablectl or cockpit). This would be a single API call in case of discovery CM or a shared CRD for all product discoveries.
141
-
* Side-Note: `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are external systems, where we don't know anything about.
144
+
* Operator A needs to compile against operator b to have access to it's discovery struct. An alternative would be to put
145
+
the Discovery CRDs in operator-rs.
146
+
* Operator versioning hell. On the other hand we have the same problem with ConfigMaps, as e.g. a newly introduced key
147
+
is missing because of an older hdfs operator version.
148
+
* Dependant Pods (such as hbase on hdfs) can not simply mount a CM containing the hdfs-site and core-site. Instead the
149
+
hbase-operator needs to read the HdfsClusterDiscovery, copy the hdfs-site and core-site into a CM and than mount that
150
+
into the hbase Pods. This can be solved by the HdfsClusterDiscovery to point to a CM that contains hdfs-site and
151
+
core-site xmls.
152
+
* Multiple API calls need to retrieve all running Stackable service (in stackablectl or cockpit). This would be a single
153
+
API call in case of discovery CM or a shared CRD for all product discoveries.
154
+
* Side-Note: `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are
155
+
external systems, where we don't know anything about.
142
156
143
157
=== [1] Discovery Object: Use dedicated CRD object for every product - in combination with ConfigMap
144
158
@@ -234,27 +248,34 @@ spec:
234
248
==== Pros
235
249
236
250
* Only one struct in operator-rs => No cross-operator dependencies.
237
-
* Single API call to retrieve all stackable products. Question is if this really helps a lot, as callers probably also are interested in the status of the product, which needs further API calls (irrelevant - see Cons).
251
+
* Single API call to retrieve all stackable products. Question is if this really helps a lot, as callers probably also
252
+
are interested in the status of the product, which needs further API calls (irrelevant - see Cons).
238
253
239
254
==== Cons
240
255
241
-
* All product discoveries are versioned together. E.g. a new mandatory field for hdfs requires all operators to bump the Discovery CRD to `v2`. We hope that this does not happen too often.
256
+
* All product discoveries are versioned together. E.g. a new mandatory field for hdfs requires all operators to bump the
257
+
Discovery CRD to `v2`. We hope that this does not happen too often.
242
258
* Names can collide
243
-
* `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are external systems, where we don't know anything about. So in case we want to introduce a `Stacklet` object listing anyway, so the `Pro` regarding the API calls is irrelevant.
259
+
* `stackablectl stacklet list` should *not* look at discovery objects, as they can come from a user and are external
260
+
systems, where we don't know anything about. So in case we want to introduce a `Stacklet` object listing anyway, so
261
+
the `Pro` regarding the API calls is irrelevant.
244
262
245
263
=== [1] Discovery Object: Write the discovery to Product CR status
246
264
247
-
Instead of writing discovery information to dedicated objects - such as CM or custom CR - we "simply" write the discovery information to the status of the Cluster CR.
265
+
Instead of writing discovery information to dedicated objects - such as CM or custom CR - we "simply" write the
266
+
discovery information to the status of the Cluster CR.
248
267
249
268
==== Pros
250
269
251
270
==== Cons
252
271
253
272
* It does not enable users to bring their own product and talk to it from Stackable, e.g. a user-provided HDFS.
254
-
* It does not allow things such as a ZNode for Zookeeper as we either use the Zookeeper CR for discovery or we use a ZNode but than can't use a Zookeeper CR.
255
-
Currently we have the freedom of either connection to a Zookeeper root dir or a ZNode transparently.
273
+
* It does not allow things such as a ZNode for Zookeeper as we either use the Zookeeper CR for discovery or we use a
274
+
ZNode but than can't use a Zookeeper CR. Currently we have the freedom of either connection to a Zookeeper root dir or
The discovery includes the SecretClass used to obtain the ca.crt used to validate the *server* certificate
259
280
260
281
Trino discovery:
@@ -290,9 +311,10 @@ backends: # Don't look at the Superset CRD structure, we are only interested in
290
311
291
312
=== [2] TLS: Client needs to specify SecretClass
292
313
---
293
-
The discovery does *not* include the SecretClass used to obtain the *server* certificate.
294
-
Instead the client must specify which SecretClass should be used to verify the *server* certificate.
295
-
For usability reasons it can be omitted and defaults to the SecretClass the client uses for itself.
314
+
315
+
The discovery does *not* include the SecretClass used to obtain the *server* certificate. Instead the client must
316
+
specify which SecretClass should be used to verify the *server* certificate. For usability reasons it can be omitted and
317
+
defaults to the SecretClass the client uses for itself.
296
318
297
319
Trino discovery:
298
320
[source,yaml]
@@ -323,13 +345,16 @@ backends: # Don't look at the Superset CRD structure, we are only interested in
323
345
324
346
==== Pros
325
347
326
-
* Operator does not need to read/look at the DiscoveryConfig (as we can statically set up the secret-op tls secretClass volumes rather than retrieving them from the DiscoveryConfig).
327
-
* Some clients only support a single pki, in that case we could not give the ability to overwrite the secretClass coming from the product itself.
348
+
* Operator does not need to read/look at the DiscoveryConfig (as we can statically set up the secret-op tls secretClass
349
+
volumes rather than retrieving them from the DiscoveryConfig).
350
+
* Some clients only support a single pki, in that case we could not give the ability to overwrite the secretClass coming
351
+
from the product itself.
328
352
329
353
==== Cons
330
354
331
355
* The client has to know what pki/secretClass the server is using.
332
-
* Superset TrinoConnection could not only say "Connect this Superset and this Trino", but need to say "using this ca.crt to validate the Trino server"
356
+
* Superset TrinoConnection could not only say "Connect this Superset and this Trino", but need to say "using this ca.crt
357
+
to validate the Trino server"
333
358
334
359
=== [2] TLS: Include caCert in Discovery config
335
360
@@ -353,18 +378,22 @@ endpoint:
353
378
354
379
==== Pros
355
380
356
-
* Easier for external clients to use as they don't need to know the concept of SecretClasses and don't even need to run withing k8s.
381
+
* Easier for external clients to use as they don't need to know the concept of SecretClasses and don't even need to run
382
+
withing k8s.
357
383
* The client has to *not* know what pki/secretClass the server is using.
358
384
359
385
==== Cons
360
386
361
-
* BIG QUESTION: How should the product operator get the ca cert from the SecretClass it uses to get the *server* cert from?
362
-
** The secret-op could e.g. offer an HTTP api to fetch the ca.crt of a given SecretClass or e.g. write the ca.crt into the status of a SecretClass
387
+
* BIG QUESTION: How should the product operator get the ca cert from the SecretClass it uses to get the *server* cert
388
+
from?
389
+
** The secret-op could e.g. offer an HTTP api to fetch the ca.crt of a given SecretClass or e.g. write the ca.crt into
390
+
the status of a SecretClass
363
391
364
392
365
393
=== [2] TLS: Include SecretClass in discovery, user can override it
366
394
367
395
Trino discovery:
396
+
368
397
[source,yaml]
369
398
----
370
399
apiVersion: trino.stackable.tech/v1alpha1
@@ -413,12 +442,18 @@ authentication:
413
442
----
414
443
415
444
==== Pros
416
-
* IMPORTANT: This is the only thing the server can know (how he is verifying client identities). He can not recommend an SecretClass used to obtain the client credentials. E.g. he uses an LDAP AuthenticationClass, there is no way it can now what SecretClass provides credentials accepted by LDAP. (Most cases it will be a user logging into a WebUI and the LDAP credentials of the user are not even stored anywhere but just remembered by the user)
445
+
* IMPORTANT: This is the only thing the server can know (how he is verifying client identities). He can not recommend an
446
+
SecretClass used to obtain the client credentials. E.g. he uses an LDAP AuthenticationClass, there is no way it can
447
+
now what SecretClass provides credentials accepted by LDAP. (Most cases it will be a user logging into a WebUI and the
448
+
LDAP credentials of the user are not even stored anywhere but just remembered by the user)
417
449
418
450
==== Cons
419
451
420
-
* Operator has to read the AuthenticationClass to determine its type (pw/tls/keytab) and set up the needed volumes and commands.
421
-
// * The AuthenticationClass is meant to describe "how should a server verify connecting clients" and re-purpose it to mean "how a client should authenticate itself". Image a user creates a Secret `trino-users` with *only* a ca.crt and a SecretClass `trino-users` on top. The connecting client than has no way of knowing how to get a client cert.
452
+
* Operator has to read the AuthenticationClass to determine its type (pw/tls/keytab) and set up the needed volumes and
453
+
commands.
454
+
// * The AuthenticationClass is meant to describe "how should a server verify connecting clients" and re-purpose it to
455
+
// mean "how a client should authenticate itself". Image a user creates a Secret `trino-users` with *only* a ca.crt
456
+
// and a SecretClass `trino-users` on top. The connecting client than has no way of knowing how to get a client cert.
422
457
423
458
=== [3] Authentication: Add SecretClass to Discovery Config
424
459
@@ -434,7 +469,8 @@ authentication:
434
469
==== Cons
435
470
436
471
* Operator has to read the SecretClass to determine its type (pw/tls/keytab) and set up the needed volumes and commands.
437
-
* Image then SecretClass is of type `k8sSearch`. The connection client (e.g. controlled via superset-operator) than has no idea if he should expect a tls.crd or a keytab when mounting the SecretClass.
472
+
* Image then SecretClass is of type `k8sSearch`. The connection client (e.g. controlled via superset-operator) than has
473
+
no idea if he should expect a tls.crd or a keytab when mounting the SecretClass.
438
474
439
475
=== [3] Authentication: Add needed details
440
476
@@ -454,7 +490,8 @@ authentication:
454
490
455
491
==== Pros
456
492
457
-
* Operator has *not* to read the SecretClass to determine its type (pw/tls/keytab), as the type is already encoded in the Discovery config.
493
+
* Operator has *not* to read the SecretClass to determine its type (pw/tls/keytab), as the type is already encoded in
494
+
the Discovery config.
458
495
459
496
==== Cons
460
497
@@ -464,7 +501,8 @@ Trino discovery does not provide any information on how to authenticate
464
501
465
502
==== Cons
466
503
467
-
* Not viable, as users need to know how to connect, and are not expected to try 50 different auth methods. We need to give them a AuthenticationClass, that says them e.g. what LDAP or PKI is used.
504
+
* Not viable, as users need to know how to connect, and are not expected to try 50 different auth methods. We need to
505
+
give them a AuthenticationClass, that says them e.g. what LDAP or PKI is used.
468
506
469
507
== Decision Outcome
470
508
@@ -473,7 +511,9 @@ Trino discovery does not provide any information on how to authenticate
473
511
[3] Authentication: `Authentication: Add AuthenticationClass to Discovery Config`
474
512
475
513
=== Appendix A
476
-
Let's model a kerberos secured HDFS with the Options "TLS: Include caCert in Discovery config" and "Authentication: Add needed details"
514
+
515
+
Let's model a kerberos secured HDFS with the Options "TLS: Include caCert in Discovery config" and "Authentication:
0 commit comments