|
1 |
| -# Enhancement Proposal-929: Data Plane Dynamic Configuration |
| 1 | +# Enhancement Proposal-929: Data Plane Configuration |
2 | 2 |
|
3 | 3 | - Issue: https://github.com/nginxinc/nginx-kubernetes-gateway/issues/929
|
4 |
| -- Status: Provisional |
| 4 | +- Status: Implementable |
5 | 5 |
|
6 | 6 | ## Summary
|
7 | 7 |
|
8 |
| -This proposal is intended to contain the design for how to dynamically configure the data plane for the |
9 |
| -NGINX Gateway Fabric (NGF) project. Similar to control plane configuration, we should be able to leverage |
| 8 | +This proposal is intended to contain the design for how to configure global settings for the data plane |
| 9 | +of the NGINX Gateway Fabric (NGF) product. Similar to control plane configuration, we should be able to leverage |
10 | 10 | a custom resource definition to define data plane configuration, considering fields such as telemetry and
|
11 | 11 | upstream zone size.
|
12 | 12 |
|
13 | 13 | ## Goals
|
14 | 14 |
|
15 |
| -Define a CRD to dynamically configure various settings for the NGF data plane. The initial configurable options |
16 |
| -will be for telemetry (tracing) and upstream zone size. |
| 15 | +Define a CRD to configure various global settings for the NGF data plane. The initial configurable |
| 16 | +options will be for telemetry (tracing) and upstream zone size. |
17 | 17 |
|
18 | 18 | ## Non-Goals
|
19 | 19 |
|
20 | 20 | 1. This proposal is not defining every setting that needs to be present in the configuration.
|
21 | 21 | 2. This proposal is not for any configuration related to control plane.
|
| 22 | + |
| 23 | +## Introduction |
| 24 | + |
| 25 | +The NGF data plane will evolve to have various user-configurable options. These could include, but are not |
| 26 | +limited to, tracing, logging, or metrics. For the best user experience, these options should be able to be |
| 27 | +changed at runtime, to avoid having to restart NGF. The first set of options that we will allow users to |
| 28 | +configure are tracing and upstream zone size. The easiest and most intuitive way to implement a Kubernetes-native |
| 29 | +API is through a CRD. |
| 30 | + |
| 31 | +The purpose of this CRD is to contain "global" configuration options for the data plane, and not focused on policy |
| 32 | +per route or backend. |
| 33 | + |
| 34 | +NGF will reload NGINX when configuration changes are made. |
| 35 | + |
| 36 | +In this doc, the term "user" will refer to the cluster operator (the person who installs and manages NGF). The |
| 37 | +cluster operator owns this CRD resource. |
| 38 | + |
| 39 | +## API, Customer Driven Interfaces, and User Experience |
| 40 | + |
| 41 | +The API would be provided in a CRD. An authorized user would interact with this CRD using `kubectl` to `get` |
| 42 | +or `edit` the configuration. |
| 43 | + |
| 44 | +Proposed configuration CRD example: |
| 45 | + |
| 46 | +```yaml |
| 47 | +apiVersion: gateway.nginx.org/v1alpha1 |
| 48 | +kind: NginxProxy |
| 49 | +metadata: |
| 50 | + name: nginx-proxy-config |
| 51 | + namespace: nginx-gateway |
| 52 | +spec: |
| 53 | + http: |
| 54 | + upstreamZoneSize: 512k # default |
| 55 | + telemetry: |
| 56 | + tracing: |
| 57 | + enabled: true # default false |
| 58 | + endpoint: my-otel-collector.svc:4317 # required |
| 59 | + interval: 5s # default |
| 60 | + batchSize: 512 # default |
| 61 | + batchCount: 4 # default |
| 62 | +status: |
| 63 | + conditions: |
| 64 | + ... |
| 65 | +``` |
| 66 | + |
| 67 | +- The CRD would be Namespace-scoped. |
| 68 | +- CRD is initialized and created when NGF is deployed, in the `nginx-gateway` Namespace. |
| 69 | +- CRD would be referenced in the [ParametersReference][ref] of the NGF GatewayClass. |
| 70 | +- Conditions include `Accepted` if the CRD config is valid, and `Programmed` to determine if an nginx |
| 71 | +reload was successful. |
| 72 | + |
| 73 | +[ref]:https://gateway-api.sigs.k8s.io/reference/spec/#gateway.networking.k8s.io/v1.ParametersReference |
| 74 | + |
| 75 | +## Use Cases |
| 76 | + |
| 77 | +The high level use case is to configure options in the NGF data plane that are not currently configurable. The |
| 78 | +CRD also allows for these to change without the need to restart the NGF Pod. |
| 79 | + |
| 80 | +### Tracing |
| 81 | + |
| 82 | +Users may want to observe how traffic is flowing through their applications. Tracing is a great way to achieve |
| 83 | +this. By taking advantage of the OpenTelemetry standards, a user can deploy any OTLP-compliant tracing collector |
| 84 | +to receive and visualize tracing data. Allowing a user to configure a tracing backend for NGF will forward |
| 85 | +nginx tracing data to that backend for visualization. |
| 86 | + |
| 87 | +For future considerations, a user may want to disable tracing for certain routes (or only enable it for certain |
| 88 | +routes), in order to reduce the amount of data being collected. We would likely be able to implement a [per-route |
| 89 | +Policy](https://gateway-api.sigs.k8s.io/geps/gep-713/#direct-policy-attachment) |
| 90 | +that would include this switch. The proposed "global" CRD in this document would remain unchanged, though |
| 91 | +could include an additional field to enable or disable tracing globally. |
| 92 | + |
| 93 | +### Upstream Zone Size |
| 94 | + |
| 95 | +As the number of servers within an upstream increases (in other words, Pod replicas for a Service), the |
| 96 | +shared memory zone size needs to increase to accomodate this. A user can fine-tune this number to fit their |
| 97 | +environment. |
| 98 | + |
| 99 | +## Testing |
| 100 | + |
| 101 | +Unit tests can be leveraged for verifying that NGF properly watches and acts on CRD changes. These tests would |
| 102 | +be similar in behavior as the current unit tests that verify the control plane CRD resource processing. |
| 103 | + |
| 104 | +We would need system level tests to ensure that tracing works as expected. |
| 105 | + |
| 106 | +## Security Considerations |
| 107 | + |
| 108 | +We need to ensure that any configurable fields that are exposed to a user are: |
| 109 | + |
| 110 | +- Properly validated. This means that the fields should be the correct type (integer, string, etc.), have appropriate |
| 111 | +length, and use regex patterns or enums to prevent any unwanted input. This will initially be done through |
| 112 | +OpenAPI schema validation. If necessary as the CRD evolves, CEL or control plane validation could be used. |
| 113 | +- Have a valid use case. The more fields we expose, the more attack vectors we create. We should only be exposing |
| 114 | +fields that are genuinely useful for a user to change dynamically. |
| 115 | + |
| 116 | +RBAC via the Kubernetes API server will ensure that only authorized users can update the CRD containing NGF data |
| 117 | +plane configuration. |
| 118 | + |
| 119 | +## Alternatives |
| 120 | + |
| 121 | +- ConfigMap |
| 122 | +A ConfigMap is another type of resource that a user can provide configuration options within, however it lacks the |
| 123 | +benefits of a CRD, specifically built-in schema validation, versioning, and conversion webhooks. |
| 124 | + |
| 125 | +- Custom API server |
| 126 | +The NGF control plane could implement its own custom API server. However the overhead of implementing this, which |
| 127 | +would include auth, validation, endpoints, and so on, would not be worth it due to the fact that the Kubernetes |
| 128 | +API server already does all of these things for us. |
| 129 | + |
| 130 | +- Policies CRD for granular control |
| 131 | +Being that these are global settings, a user may have a need for more granular control, in other words, changing |
| 132 | +the settings at a per-route or per-backend basis. A new Policy CRD could accomplish this in future work. |
| 133 | + |
| 134 | +## References |
| 135 | + |
| 136 | +- [Kubernetes Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) |
0 commit comments