Skip to content

Commit d8843b1

Browse files
miledxzsjberman
andcommitted
Add 1.3.0 longevity test results (nginx#2093)
CLOSES - nginx#2053 Co-authored-by: Saylor Berman <s.berman@f5.com>
1 parent dd9e615 commit d8843b1

16 files changed

+239
-0
lines changed
115 KB
Loading
Loading
115 KB
Loading
Loading
Loading
145 KB
Loading
Loading

tests/results/longevity/1.3.0/oss.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Results
2+
3+
## Test environment
4+
5+
NGINX Plus: false
6+
7+
GKE Cluster:
8+
9+
- Node count: 3
10+
- k8s version: v1.28.9-gke.1000000
11+
- vCPUs per node: 2
12+
- RAM per node: 4019180Ki
13+
- Max pods per node: 110
14+
- Zone: us-central1-c
15+
- Instance Type: e2-medium
16+
- NGF pod name -- ngf-longevity-nginx-gateway-fabric-59576c5749-dgrwg
17+
18+
## Traffic
19+
20+
HTTP:
21+
22+
```text
23+
Running 5760m test @ http://cafe.example.com/coffee
24+
2 threads and 100 connections
25+
Thread Stats Avg Stdev Max +/- Stdev
26+
Latency 186.26ms 145.81ms 2.00s 78.32%
27+
Req/Sec 299.23 199.98 1.80k 66.54%
28+
202451765 requests in 5760.00m, 70.37GB read
29+
Socket errors: connect 0, read 338005, write 0, timeout 4600
30+
Requests/sec: 585.80
31+
Transfer/sec: 213.51KB
32+
```
33+
34+
HTTPS:
35+
36+
```text
37+
Running 5760m test @ https://cafe.example.com/tea
38+
2 threads and 100 connections
39+
Thread Stats Avg Stdev Max +/- Stdev
40+
Latency 177.05ms 122.73ms 1.98s 67.91%
41+
Req/Sec 298.12 199.66 1.83k 66.52%
42+
201665338 requests in 5760.00m, 69.02GB read
43+
Socket errors: connect 0, read 332742, write 0, timeout 40
44+
Requests/sec: 583.52
45+
Transfer/sec: 209.42KB
46+
```
47+
48+
### Logs
49+
50+
No error logs in nginx-gateway
51+
52+
No error logs in nginx
53+
54+
### Key Metrics
55+
56+
#### Containers memory
57+
58+
![oss-memory.png](oss-memory.png)
59+
60+
Drop in NGINX memory usage corresponds to the end of traffic generation.
61+
62+
#### NGF Container Memory
63+
64+
![oss-ngf-memory.png](oss-ngf-memory.png)
65+
66+
### Containers CPU
67+
68+
![oss-cpu.png](oss-cpu.png)
69+
70+
Drop in NGINX CPU usage corresponds to the end of traffic generation.
71+
72+
### NGINX metrics
73+
74+
![oss-stub-status.png](oss-stub-status.png)
75+
76+
Drop in request corresponds to the end of traffic generation.
77+
78+
79+
### Reloads
80+
81+
Rate of reloads - successful and errors:
82+
83+
![oss-reloads.png](oss-reloads.png)
84+
85+
86+
Reload spikes correspond to 1 hour periods of backend re-rollouts.
87+
88+
No reloads finished with an error.
89+
90+
Reload time distribution - counts:
91+
92+
![oss-reload-time.png](oss-reload-time.png)
93+
94+
95+
Reload related metrics at the end:
96+
97+
![oss-final-reloads.png](oss-final-reloads.png)
98+
99+
All successful reloads took less than 5 seconds, with most under 1 second.
100+
101+
## Comparison with previous runs
102+
103+
Graphs look similar to 1.2.0 results.
104+
As https://github.com/nginxinc/nginx-gateway-fabric/issues/1112 was fixed, we no longer see the corresponding
105+
reload spikes.
106+
Memory usage is flat, but ~1 Mb higher than in 1.2.0.
114 KB
Loading
Loading
124 KB
Loading
Loading
Loading
74.8 KB
Loading
152 KB
Loading

tests/results/longevity/1.3.0/plus.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# Results
2+
3+
## Test environment
4+
5+
NGINX Plus: true
6+
7+
GKE Cluster:
8+
9+
- Node count: 3
10+
- k8s version: v1.28.9-gke.1000000
11+
- vCPUs per node: 2
12+
- RAM per node: 4019172Ki
13+
- Max pods per node: 110
14+
- Zone: us-central1-c
15+
- Instance Type: e2-medium
16+
- NGF pod name -- ngf-longevity-nginx-gateway-fabric-c4db4c565-8rhdn
17+
18+
## Traffic
19+
20+
HTTP:
21+
22+
```text
23+
Running 5760m test @ http://cafe.example.com/coffee
24+
2 threads and 100 connections
25+
Thread Stats Avg Stdev Max +/- Stdev
26+
Latency 192.36ms 124.47ms 1.99s 66.49%
27+
Req/Sec 279.51 187.21 1.88k 65.53%
28+
188805938 requests in 5760.00m, 65.81GB read
29+
Socket errors: connect 0, read 22, write 4, timeout 42
30+
Non-2xx or 3xx responses: 3
31+
Requests/sec: 546.31
32+
Transfer/sec: 199.68KB
33+
```
34+
35+
HTTPS:
36+
37+
```text
38+
Running 5760m test @ https://cafe.example.com/tea
39+
2 threads and 100 connections
40+
Thread Stats Avg Stdev Max +/- Stdev
41+
Latency 192.58ms 124.49ms 2.00s 66.48%
42+
Req/Sec 278.96 186.56 1.76k 65.59%
43+
188455380 requests in 5760.00m, 64.69GB read
44+
Socket errors: connect 10, read 24, write 0, timeout 60
45+
Non-2xx or 3xx responses: 5
46+
Requests/sec: 545.30
47+
Transfer/sec: 196.26KB
48+
```
49+
50+
Note: Non-2xx or 3xx responses correspond to the error in NGINX log, see below.
51+
52+
### Logs
53+
54+
nginx-gateway:
55+
56+
a lot of expected "usage reporting not enabled" errors.
57+
58+
And:
59+
```text
60+
failed to start control loop: leader election lost
61+
62+
4 x
63+
failed to start control loop: error setting initial control plane configuration: NginxGateway nginx-gateway/ngf-longevity-config not found: failed to get API group resources: unable to retrieve the complete list of server APIs: gateway.nginx.org/v1alpha1: Get "https://10.61.192.1:443/apis/gateway.nginx.org/v1alpha1?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
64+
```
65+
66+
Those errors correspond to losing connectivity with the Kubernetes API and NGF container restarts (5). Such logs
67+
are expected in that case.
68+
69+
nginx:
70+
71+
```text
72+
2024/06/01 21:34:09 [error] 104#104: *115862644 no live upstreams while connecting to upstream, client: 10.128.0.112, server: cafe.example.com, request: "GET /tea HTTP/1.1", upstream: "http://longevity_tea_80/tea", host: "cafe.example.com"
73+
2024/06/03 12:01:07 [error] 105#105: *267137988 no live upstreams while connecting to upstream, client: 10.128.0.112, server: cafe.example.com, request: "GET /coffee HTTP/1.1", upstream: "http://longevity_coffee_80/coffee", host: "cafe.example.com"
74+
```
75+
76+
8 errors like that occurred at different times. They occurred when backend pods were updated. Not clear why that happens.
77+
Because number of errors is small compared with total handled requests, no need to further
78+
investigate unless we see it in the future again at larger volume.
79+
80+
### Key Metrics
81+
82+
#### Containers memory
83+
84+
![plus-memory.png](plus-memory.png)
85+
86+
Drop in NGF memory usage in the beginning corresponds to the nginx-gateway container restarts.
87+
88+
Drop in NGINX memory usage corresponds to the end of traffic generation.
89+
90+
#### NGF Container Memory
91+
92+
![plus-ngf-memory.png](plus-ngf-memory.png)
93+
94+
### Containers CPU
95+
96+
![plus-cpu.png](plus-cpu.png)
97+
98+
Drop in NGINX CPU usage corresponds to the end of traffic generation.
99+
100+
### NGINX Plus metrics
101+
102+
![plus-status.png](plus-status.png)
103+
104+
Drop in request corresponds to the end of traffic generation.
105+
106+
### Reloads
107+
108+
Rate of reloads - successful and errors:
109+
110+
![plus-reloads.png](plus-reloads.png)
111+
112+
Note: compared to NGINX, we don't have as many reloads here, because NGF uses NGINX Plus API to reconfigure NGINX
113+
for endpoints changes.
114+
115+
No reloads finished with an error.
116+
117+
Reload time distribution - counts:
118+
119+
![plus-reload-time.png](plus-reload-time.png)
120+
121+
Reload related metrics at the end:
122+
123+
![plus-final-reloads.png](plus-final-reloads.png)
124+
125+
All successful reloads took less than 1 seconds, with most under 0.5 second.
126+
127+
## Comparison with previous runs
128+
129+
Graphs look similar to 1.2.0 results.
130+
As https://github.com/nginxinc/nginx-gateway-fabric/issues/1112 was fixed, we no longer see the corresponding
131+
reload spikes.
132+
133+
Memory usage is flat, but ~1 Mb higher than in 1.2.0.

0 commit comments

Comments
 (0)