Skip to content

Commit b77d74b

Browse files
authored
Add results for reconfig test (#1354)
Problem: We need to run the reconfig test against the 1.1 release. Solution: Record the results of the reconfig test for the 1.1 release.
1 parent 56016e9 commit b77d74b

File tree

4 files changed

+114
-16
lines changed

4 files changed

+114
-16
lines changed

tests/reconfig/results/1.0.0/1.0.0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ NGF deployment:
5656
## NumResources -> Total Resources
5757

5858
| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Total Resources |
59-
| ------------ | -------- | ------- | --------------- | ---------- | ---------------- | -------------------- | ---------- | --------------- |
59+
|--------------|----------|---------|-----------------|------------|------------------|----------------------|------------|-----------------|
6060
| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | <total> |
6161
| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 244 |
6262
| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 1204 |

tests/reconfig/results/1.1.0/1.1.0.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Reconfiguration testing Results
2+
3+
<!-- TOC -->
4+
- [Reconfiguration testing Results](#reconfiguration-testing-results)
5+
- [Summary](#summary)
6+
- [Test environment](#test-environment)
7+
- [Results Tables](#results-tables)
8+
- [NGINX Reloads and Time to Ready](#nginx-reloads-and-time-to-ready)
9+
- [Event Batch Processing](#event-batch-processing)
10+
- [NumResources to Total Resources](#numresources-to-total-resources)
11+
- [Observations](#observations)
12+
- [Future Improvements](#future-improvements)
13+
<!-- TOC -->
14+
15+
## Summary
16+
17+
- Better reload times across all tests
18+
- Similar TimeToReadyTotal and TimeToReadyAveSingle times
19+
- Similar event batch totals
20+
- Slightly better event batch processing average times
21+
- No new errors or issues
22+
23+
## Test environment
24+
25+
GKE cluster:
26+
27+
- Node count: 4
28+
- Instance Type: n2d-standard-2
29+
- k8s version: 1.27.3-gke.100
30+
- Zone: us-west2-a
31+
- Total vCPUs: 8
32+
- Total RAM: 32GB
33+
- Max pods per node: 110
34+
35+
NGF deployment:
36+
37+
- NGF version: edge - git commit 3cab370a46bccd55c115c16e23a475df2497a3d2
38+
- NGINX Version: 1.25.3
39+
40+
## Results Tables
41+
42+
### NGINX Reloads and Time to Ready
43+
44+
| Test number | NumResources | TimeToReadyTotal (s) | TimeToReadyAvgSingle (s) | NGINX reloads | NGINX reload avg time (ms) | <= 500ms | <= 1000ms |
45+
|-------------|--------------|----------------------|--------------------------|---------------|----------------------------|----------|-----------|
46+
| 1 | 30 | 1.5 | <1 | 2 | 158.5 | 100% | 100% |
47+
| 1 | 150 | 3.5 | 1 | 2 | 272.5 | 100% | 100% |
48+
| 2 | 30 | 34 | <1 | 93 | 136 | 100% | 100% |
49+
| 2 | 150 | 176.5 | <1 | 451 | 203.98 | 100% | 100% |
50+
| 3 | 30 | <1 | 1 | 93 | 125.7 | 100% | 100% |
51+
| 3 | 150 | 1 | 1 | 453 | 126.71 | 100% | 100% |
52+
53+
54+
### Event Batch Processing
55+
56+
| Test number | NumResources | Event Batch Total | Event Batch Processing avg time (ms) | <= 500ms | <= 1000ms | <= 5000ms | <= 10000ms | <= 30000ms |
57+
|-------------|--------------|-------------------|--------------------------------------|----------|-----------|-----------|------------|------------|
58+
| 1 | 30 | 70 | 5.12 | 100% | 100% | 100% | 100% | 100% |
59+
| 1 | 150 | 309 | 2.14 | 100% | 100% | 100% | 100% | 100% |
60+
| 2 | 30 | 442 | 35.4 | 100% | 100% | 100% | 100% | 100% |
61+
| 2 | 150 | 2009 | 54.76 | 100% | 100% | 100% | 100% | 100% |
62+
| 3 | 30 | 373 | 35.72 | 99.73% | 99.73% | 100% | 100% | 100% |
63+
| 3 | 150 | 1813 | 39.46 | 99.94% | 99.94% | 99.94% | 99.94% | 100% |
64+
65+
> Note: The outlier for test #3 is the event batch that contains the Gateway. It took ~13s to process.
66+
67+
## NumResources to Total Resources
68+
69+
| NumResources | Gateways | Secrets | ReferenceGrants | Namespaces | application Pods | application Services | HTTPRoutes | Attached HTTPRoutes | Total Resources |
70+
|--------------|----------|---------|-----------------|------------|------------------|----------------------|------------|---------------------|-----------------|
71+
| x | 1 | 1 | 1 | x+1 | 2x | 2x | 3x | 2x | <total> |
72+
| 30 | 1 | 1 | 1 | 31 | 60 | 60 | 90 | 60 | 244 |
73+
| 150 | 1 | 1 | 1 | 151 | 300 | 300 | 450 | 300 | 1204 |
74+
75+
> Note: Only 2x HTTPRoutes attach to the Gateway because the parentRef name in the `cafe-tls-redirect` HTTPRoute is incorrect. This will be fixed in the next release.
76+
77+
## Observations
78+
79+
1. The following issues still exist:
80+
81+
- https://github.com/nginxinc/nginx-gateway-fabric/issues/1124
82+
- https://github.com/nginxinc/nginx-gateway-fabric/issues/1123
83+
84+
2. All NGINX reloads were in the <= 500ms bucket. An increase in the reload time based on number of configured resources resulting in NGINX configuration changes was observed.
85+
86+
3. No errors (NGF or NGINX) were observed in any test run.
87+
88+
4. The majority of the event batches were processed in 500ms or less except the 3rd test. In the 3rd test, we create the Gateway resource after all the apps and routes. The batch that contains the Gateway is the only one that takes longer than 500ms. It takes ~13s.
89+
90+
## Future Improvements
91+
92+
1. Fix the parentRef name in the `cafe-tls-redirect` [HTTPRoute](/tests/reconfig/scripts/cafe-routes.yaml), so it matches the deployed Gateway.

tests/reconfig/scripts/delete-multiple.sh

100644100755
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,13 @@
33
num_namespaces=$1
44

55
# Delete namespaces
6+
namespaces=""
67
for ((i=1; i<=$num_namespaces; i++)); do
7-
namespace_name="namespace$i"
8-
kubectl delete namespace "$namespace_name"
8+
namespaces+="namespace$i "
99
done
1010

11+
kubectl delete namespace $namespaces
12+
1113
# Delete single instance resources
1214
kubectl delete -f gateway.yaml
1315
kubectl delete -f reference-grant.yaml

tests/reconfig/setup.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
The following cluster will be sufficient:
2828

29-
- A Kubernetes cluster with 3 nodes on GKE
29+
- A Kubernetes cluster with 4 nodes on GKE
3030
- Node: e2-medium (2 vCPU, 4GB memory)
3131

3232
## Setup
@@ -43,7 +43,7 @@
4343

4444
```console
4545
helm install my-release oci://ghcr.io/nginxinc/charts/nginx-gateway-fabric --version 0.0.0-edge \
46-
--create-namespace --wait -n nginx-gateway
46+
--create-namespace --wait -n nginx-gateway --set nginxGateway.config.logging.level=debug
4747
```
4848

4949
4. Run tests:
@@ -58,13 +58,17 @@
5858
- Note: Clean up after each test run for isolated results. There's a script provided for removing all the test
5959
fixtures `scripts/delete-multiple.sh` which takes a number (needs to be the same number as what was used in the
6060
create script.)
61-
5. After each individual test run, grab logs of both NGF containers and grab metrics.
62-
Note: You can expose metrics by running the below snippet and then navigating to `127.0.0.1:9113/metrics`:
63-
64-
```console
65-
GW_POD=$(k get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p')
66-
kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 &
67-
```
61+
5. After each individual test:
62+
- Describe the Gateway resource and make sure the status is correct.
63+
- Check the logs of both NGF containers for errors.
64+
- Parse the logs for TimeToReady numbers (see steps 6-7 below).
65+
- Grab metrics.
66+
Note: You can expose metrics by running the below snippet and then navigating to `127.0.0.1:9113/metrics`:
67+
68+
```console
69+
GW_POD=$(k get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p')
70+
kubectl port-forward $GW_POD -n nginx-gateway 9113:9113 &
71+
```
6872

6973
6. Measure NGINX Reloads and Time to Ready Results
7074
1. TimeToReadyTotal as described in each test - NGF logs.
@@ -75,11 +79,11 @@
7579
1. The average reload duration can be computed by taking the `nginx_gateway_fabric_nginx_reloads_milliseconds_sum`
7680
metric value and dividing it by the `nginx_gateway_fabric_nginx_reloads_milliseconds_count` metric value.
7781
7. Measure Event Batch Processing Results
78-
1. Event Batch Total - metrics.
82+
1. Event Batch Total - `nginx_gateway_fabric_event_batch_processing_milliseconds_count` metric.
7983
2. Average Event Batch Processing duration - metrics.
80-
1. The average event batch processing duraiton can be computed by taking the `nginx_gateway_fabric_event_batch_processing_milliseconds_sum`
84+
1. The average event batch processing duration can be computed by taking the `nginx_gateway_fabric_event_batch_processing_milliseconds_sum`
8185
metric value and dividing it by the `nginx_gateway_fabric_event_batch_processing_milliseconds_count` metric value.
82-
8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomolies or outliers.
86+
8. For accuracy, repeat the test suite once or twice, take the averages, and look for any anomalies or outliers.
8387

8488
## Tests
8589

@@ -90,7 +94,7 @@
9094
e.g. `cd scripts && bash create-resources-gw-last.sh 30`. The script will deploy backend apps and services, wait
9195
60 seconds for them to be ready, and deploy 1 Gateway, 1 RefGrant, 1 Secret, and HTTPRoutes.
9296
2. Deploy NGF
93-
3. Measure TimeToReadyTotal as the time it takes from start-up -> config written and
97+
3. Measure TimeToReadyTotal as the time it takes from start-up -> final config written and
9498
NGINX reloaded. Measure the other results as described in steps 6-7 of the [Setup](#setup) section.
9599

96100
### Test 2: Start NGF, deploy Gateway, create many resources attached to GW

0 commit comments

Comments
 (0)