Skip to content

Commit 31c2bcf

Browse files
author
Kate Osborn
committed
Various updates
1 parent a7f42e3 commit 31c2bcf

File tree

1 file changed

+38
-16
lines changed
  • design/control-data-plane-separation

1 file changed

+38
-16
lines changed

design/control-data-plane-separation/design.md

Lines changed: 38 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -73,29 +73,28 @@ The following list outlines all of NKG's requirements for an agent and whether t
7373

7474
The nginx agent is missing a few requirements we will need to add for our use case.
7575

76-
Immediate features needed:
76+
Features needed (in priority order, more or less):
7777

78-
- Add readiness and liveness endpoints
7978
- Add support for certificate rotation for the agent <-> control plane gRPC channel
80-
81-
Longer-term features needed:
82-
79+
- Deterministically confirm that a nginx reload succeeds (e.g. check that new worker processes are running)
80+
- Add an option to configure the server's token via a file
81+
- Add an option to refresh server token from a file
82+
- Add readiness and liveness endpoints
8383
- Produce a container image as a release artifact
8484
- This image should be non-root
8585
- This image should be as minimal as possible
8686
- Allow the control plane to access the N+ API to configure upstreams and the key-value store.
8787
- Add support for metrics enrichment. Metrics can be enriched with Kubernetes meta-information such as namespace, pod
8888
name, etc.
8989

90-
Features that **may** be in progress, planned, or in some cases, supported:
90+
Agent features/plugins that we'd like to disable:
9191

92-
- Add an option to configure the server’s token via a file.
93-
- Add an option to disable the agent’s metrics service client
94-
- Add an option to disable the data plane status updates
95-
- Add an option to disable the config upload feature
92+
- Metrics service client
93+
- Data plane status updates
94+
- Config upload feature
9695
- This is the feature that uploads the config to the control plane
97-
- Add an option to disable the nginx-counting feature
98-
- Add an option to disable the activity-events feature
96+
- The nginx-counting feature
97+
- The activity-events feature
9998

10099
### Benefits
101100

@@ -399,9 +398,8 @@ this [file](https://github.com/nginx/agent/blob/main/sdk/proto/nginx.proto).
399398
### Authentication
400399

401400
The agent and control plane will mutually authenticate each other using mTLS. We will store the server and client
402-
certificates, key pairs, and CA certificates in Kubernetes Secrets. The user will install the Secrets in the the
403-
’nginx-gateway`
404-
namespace under the following names:
401+
certificates, key pairs, and CA certificates in Kubernetes Secrets. The user will install the Secrets in
402+
the `nginx-gateway`namespace under the following names:
405403

406404
- `nginx-gateway-cert`: This Secret will contain the TLS certificate and private key that the control plane will use to
407405
serve gRPC traffic, as well as the CA bundle that validates the agent’s certificate.
@@ -410,7 +408,8 @@ namespace under the following names:
410408

411409
The Secrets will be mounted to the control plane and agent containers, respectively. If desired, we can make the Secret
412410
names and mount path configurable via flags. For production, we will direct the user to provide their own certificates.
413-
For development and testing purposes, we will provide a self-signed default certificate.
411+
For development and testing purposes, we will provide a self-signed default certificate. In order to be secure by
412+
default, NKG should generate the default keypair during installation using a Kubernetes Job.
414413

415414
#### Certificate Rotation
416415

@@ -716,6 +715,16 @@ have a use case for runtime configuration at the moment.
716715

717716
[cli]: https://docs.nginx.com/nginx-management-suite/nginx-agent/install-nginx-agent/#nginx-agent-cli-flags-usage
718717

718+
## Edge Cases
719+
720+
The following edge cases should be considered and tested during implementation:
721+
722+
- The data plane fails to establish a connection with the control plane.
723+
- Existing connections between data plane and control plane are terminated during a download event.
724+
725+
In these cases, we expect the agent to be resilient. It should not crash or produce invalid config, and it should retry
726+
when possible.
727+
719728
## Data Plane Scaling
720729

721730
Since the data plane is deployed in its own Pod, a user can horizontally scale the data plane independently of the
@@ -767,4 +776,17 @@ PASS
767776
ok command-line-arguments 17.727s
768777
```
769778

779+
### Performance goals
780+
781+
- NKG can handle frequent configuration changes (1 change per second)
782+
- NKG can handle large configurations:
783+
- 5000 server blocks
784+
- 64 TLS certs/keys
785+
- 50 JWT keys
786+
- 50 TLS cert/keys for egress
787+
- 50 CA certs
788+
- 50 basic auth files
789+
- 50 OIDC secrets
790+
- NKG can scale to X number of data plane pods (we need to figure out what X is)
791+
770792
[performance]: https://github.com/nginx/agent/blob/main/test/performance/user_workflow_test.go

0 commit comments

Comments
 (0)