Skip to content

Commit 404ff96

Browse files
committed
Add longevity test plan and results
Problem: * We don't know if NGF can successfully process both control plane and data plane transactions over a period of time much greater than in our tests. * We didn't yet try to catch bugs that could only appear over a period of time (like resource leaks). Solution: - Create a longevity test plan - Run the test - Document the results CLOSES #956
1 parent 72b6c6e commit 404ff96

14 files changed

+634
-0
lines changed

.yamllint.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ rules:
4141
.github/workflows/
4242
deploy/manifests/nginx-gateway.yaml
4343
deploy/manifests/crds
44+
tests/longevity/manifests/cronjob.yaml
4445
new-line-at-end-of-file: enable
4546
new-lines: enable
4647
octal-values: disable

tests/longevity/longevity.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Longevity Test
2+
3+
This document describes how we test NGF for longevity.
4+
5+
<!-- TOC -->
6+
7+
- [Longevity Test](#longevity-test)
8+
- [Goals](#goals)
9+
- [Test Environment](#test-environment)
10+
- [Steps](#steps)
11+
- [Start](#start)
12+
- [Check the Test is Running Correctly](#check-the-test-is-running-correctly)
13+
- [End](#end)
14+
- [Analyze](#analyze)
15+
- [Results](#results)
16+
17+
<!-- TOC -->
18+
19+
## Goals
20+
21+
- Ensure that NGF successfully processes both control plane and data plane transactions over a period of time much
22+
greater than in our other tests.
23+
- Catch bugs that could only appear over a period of time (like resource leaks).
24+
25+
## Test Environment
26+
27+
- A Kubernetes cluster with 3 nodes on GKE
28+
- Node: e2-medium (2 vCPU, 4GB memory)
29+
- Enabled GKE logging.
30+
- Enabled GKE Cloud monitoring with managed Prometheus service, with enabled:
31+
- system.
32+
- kube state - pods, deployments.
33+
- Tester VMs:
34+
- Configuration:
35+
- Debian
36+
- Install packages: tmux, wrk
37+
- Location - same zone as the Kubernetes cluster.
38+
- First VM - for HTTP traffic
39+
- Second VM - for sending HTTPs traffic
40+
- NGF
41+
- Deployment with 1 replica
42+
- Exposed via a Service with type LoadBalancer, private IP
43+
- Gateway, two listeners - HTTP and HTTPs
44+
- Two apps:
45+
- Coffee - 3 replicas
46+
- Tea - 3 replicas
47+
- Two HTTPRoutes
48+
- Coffee (HTTP)
49+
- Tea (HTTPS)
50+
51+
## Steps
52+
53+
### Start
54+
55+
Test duration - 4 days.
56+
57+
1. Create a Kubernetes cluster on GKE.
58+
2. Deploy NGF.
59+
3. Expose NFG via a Load Balancer Service with `"networking.gke.io/load-balancer-type":"Internal"` annotation to
60+
allocate an internal load balancer.
61+
4. Apply the manifests which will:
62+
1. Deploy the coffee and tea backends.
63+
2. Configure HTTP and HTTPS listeners on the Gateway.
64+
3. Expose coffee via HTTP listener and tea via HTTPS listener.
65+
4. Create two CronJobs to re-rollout backends:
66+
1. Coffee - every minute for an hour every 6 hours
67+
2. Tea - every minute for an hour every 6 hours, 3 ours apart from coffee.
68+
5. Configure Prometheus on GKE to pick up NGF metrics.
69+
70+
```shell
71+
kubectl apply -f files
72+
```
73+
74+
5. In Tester VMs, update `/etc/hosts` to have an entry with the External IP of the NGF Service (`10.128.0.10` in this
75+
case):
76+
77+
```text
78+
10.128.0.10 cafe.example.com
79+
```
80+
81+
6. In Tester VMs, start a tmux session (this is needed so that even if you disconnect from the VM, any launched command:
82+
will keep running):
83+
84+
```shell
85+
tmux
86+
```
87+
88+
7. In First VM, start wrk for 4 days for coffee via HTTP:
89+
90+
```shell
91+
wrk -t2 -c100 -d96h http://cafe.example.com/coffee
92+
```
93+
94+
8. In Second VM, start wrk for 4 days for tea via HTTPS:
95+
96+
```shell
97+
wrk -t2 -c100 -d96h https://cafe.example.com/tea
98+
```
99+
100+
Notes:
101+
102+
- The updated coffee and tea backends in cafe.yaml include extra configuration for zero time upgrades, so that
103+
wrk in Tester VMs don't get 502 from NGF. Based on https://learnk8s.io/graceful-shutdown
104+
105+
### Check the Test is Running Correctly
106+
107+
Check that you don't see any errors:
108+
109+
1. Traffic is flowing - look at the access logs of NGINX.
110+
2. Check that cron job can run.
111+
112+
```shell
113+
kubectl create job --from=cronjob/coffee-rollout-mgr coffee-test
114+
kubectl create job --from=cronjob/tea-rollout-mgr tea-test
115+
```
116+
117+
3. Check that GKE exports logs and Prometheus metrics.
118+
119+
In case of errors, double check if you prepared the environment and launched the test correctly.
120+
121+
### End
122+
123+
- Remove CronJobs.
124+
125+
## Analyze
126+
127+
- Traffic
128+
- Tester VMs (clients)
129+
- As wrk stop, they will print output upon termination. To connect to the tmux session with wrk,
130+
run `tmux attach -t 0`
131+
- Check for errors, latency, RPS
132+
- Logs
133+
- Check the logs for errors in Google Cloud Operations Logging.
134+
- NGF
135+
- NGINX
136+
- Check metrics in Google Cloud Monitoring.
137+
- NGF
138+
- CPU usage
139+
- NGINX
140+
- NGF
141+
- Memory usage
142+
- NGINX
143+
- NGF
144+
- NGINX metrics
145+
- Reloads
146+
147+
## Results
148+
149+
- [1.0.0](results/1.0.0.md)
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
apiVersion: gateway.networking.k8s.io/v1beta1
2+
kind: HTTPRoute
3+
metadata:
4+
name: coffee
5+
spec:
6+
parentRefs:
7+
- name: gateway
8+
sectionName: http
9+
hostnames:
10+
- "cafe.example.com"
11+
rules:
12+
- matches:
13+
- path:
14+
type: PathPrefix
15+
value: /coffee
16+
backendRefs:
17+
- name: coffee
18+
port: 80
19+
---
20+
apiVersion: gateway.networking.k8s.io/v1beta1
21+
kind: HTTPRoute
22+
metadata:
23+
name: tea
24+
spec:
25+
parentRefs:
26+
- name: gateway
27+
sectionName: https
28+
hostnames:
29+
- "cafe.example.com"
30+
rules:
31+
- matches:
32+
- path:
33+
type: PathPrefix
34+
value: /tea
35+
backendRefs:
36+
- name: tea
37+
port: 80
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
apiVersion: v1
2+
kind: Secret
3+
metadata:
4+
name: cafe-secret
5+
type: kubernetes.io/tls
6+
data:
7+
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUNzakNDQVpvQ0NRQzdCdVdXdWRtRkNEQU5CZ2txaGtpRzl3MEJBUXNGQURBYk1Sa3dGd1lEVlFRRERCQmoKWVdabExtVjRZVzF3YkdVdVkyOXRNQjRYRFRJeU1EY3hOREl4TlRJek9Wb1hEVEl6TURjeE5ESXhOVEl6T1ZvdwpHekVaTUJjR0ExVUVBd3dRWTJGbVpTNWxlR0Z0Y0d4bExtTnZiVENDQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFECmdnRVBBRENDQVFvQ2dnRUJBTHFZMnRHNFc5aStFYzJhdnV4Q2prb2tnUUx1ek10U1Rnc1RNaEhuK3ZRUmxIam8KVzFLRnMvQVdlS25UUStyTWVKVWNseis4M3QwRGtyRThwUisxR2NKSE50WlNMb0NEYUlRN0Nhck5nY1daS0o4Qgo1WDNnVS9YeVJHZjI2c1REd2xzU3NkSEQ1U2U3K2Vab3NPcTdHTVF3K25HR2NVZ0VtL1Q1UEMvY05PWE0zZWxGClRPL051MStoMzROVG9BbDNQdTF2QlpMcDNQVERtQ0thaEROV0NWbUJQUWpNNFI4VERsbFhhMHQ5Z1o1MTRSRzUKWHlZWTNtdzZpUzIrR1dYVXllMjFuWVV4UEhZbDV4RHY0c0FXaGRXbElweHlZQlNCRURjczN6QlI2bFF1OWkxZAp0R1k4dGJ3blVmcUVUR3NZdWxzc05qcU95V1VEcFdJelhibHhJZVVDQXdFQUFUQU5CZ2txaGtpRzl3MEJBUXNGCkFBT0NBUUVBcjkrZWJ0U1dzSnhLTGtLZlRkek1ISFhOd2Y5ZXFVbHNtTXZmMGdBdWVKTUpUR215dG1iWjlpbXQKL2RnWlpYVE9hTElHUG9oZ3BpS0l5eVVRZVdGQ2F0NHRxWkNPVWRhbUloOGk0Q1h6QVJYVHNvcUNOenNNLzZMRQphM25XbFZyS2lmZHYrWkxyRi8vblc0VVNvOEoxaCtQeDljY0tpRDZZU0RVUERDRGh1RUtFWXcvbHpoUDJVOXNmCnl6cEJKVGQ4enFyM3paTjNGWWlITmgzYlRhQS82di9jU2lyamNTK1EwQXg4RWpzQzYxRjRVMTc4QzdWNWRCKzQKcmtPTy9QNlA0UFlWNTRZZHMvRjE2WkZJTHFBNENCYnExRExuYWRxamxyN3NPbzl2ZzNnWFNMYXBVVkdtZ2todAp6VlZPWG1mU0Z4OS90MDBHUi95bUdPbERJbWlXMGc9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
8+
tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQzZtTnJSdUZ2WXZoSE4KbXI3c1FvNUtKSUVDN3N6TFVrNExFeklSNS9yMEVaUjQ2RnRTaGJQd0ZuaXAwMFBxekhpVkhKYy92TjdkQTVLeApQS1VmdFJuQ1J6YldVaTZBZzJpRU93bXF6WUhGbVNpZkFlVjk0RlAxOGtSbjl1ckV3OEpiRXJIUncrVW51L25tCmFMRHF1eGpFTVBweGhuRklCSnYwK1R3djNEVGx6TjNwUlV6dnpidGZvZCtEVTZBSmR6N3Rid1dTNmR6MHc1Z2kKbW9RelZnbFpnVDBJek9FZkV3NVpWMnRMZllHZWRlRVJ1VjhtR041c09va3R2aGxsMU1udHRaMkZNVHgySmVjUQo3K0xBRm9YVnBTS2NjbUFVZ1JBM0xOOHdVZXBVTHZZdFhiUm1QTFc4SjFINmhFeHJHTHBiTERZNmpzbGxBNlZpCk0xMjVjU0hsQWdNQkFBRUNnZ0VBQnpaRE50bmVTdWxGdk9HZlFYaHRFWGFKdWZoSzJBenRVVVpEcUNlRUxvekQKWlV6dHdxbkNRNlJLczUyandWNTN4cU9kUU94bTNMbjNvSHdNa2NZcEliWW82MjJ2dUczYnkwaVEzaFlsVHVMVgpqQmZCcS9UUXFlL2NMdngvSkczQWhFNmJxdFRjZFlXeGFmTmY2eUtpR1dzZk11WVVXTWs4MGVJVUxuRmZaZ1pOCklYNTlSOHlqdE9CVm9Sa3hjYTVoMW1ZTDFsSlJNM3ZqVHNHTHFybmpOTjNBdWZ3ZGRpK1VDbGZVL2l0K1EvZkUKV216aFFoTlRpNVFkRWJLVStOTnYvNnYvb2JvandNb25HVVBCdEFTUE05cmxFemIralQ1WHdWQjgvLzRGY3VoSwoyVzNpcjhtNHVlQ1JHSVlrbGxlLzhuQmZ0eVhiVkNocVRyZFBlaGlPM1FLQmdRRGlrR3JTOTc3cjg3Y1JPOCtQClpoeXltNXo4NVIzTHVVbFNTazJiOTI1QlhvakpZL2RRZDVTdFVsSWE4OUZKZnNWc1JRcEhHaTFCYzBMaTY1YjIKazR0cE5xcVFoUmZ1UVh0UG9GYXRuQzlPRnJVTXJXbDVJN0ZFejZnNkNQMVBXMEg5d2hPemFKZUdpZVpNYjlYTQoybDdSSFZOcC9jTDlYbmhNMnN0Q1lua2Iwd0tCZ1FEUzF4K0crakEyUVNtRVFWNXA1RnRONGcyamsyZEFjMEhNClRIQ2tTazFDRjhkR0Z2UWtsWm5ZbUt0dXFYeXNtekJGcnZKdmt2eUhqbUNYYTducXlpajBEdDZtODViN3BGcVAKQWxtajdtbXI3Z1pUeG1ZMXBhRWFLMXY4SDNINGtRNVl3MWdrTWRybVJHcVAvaTBGaDVpaGtSZS9DOUtGTFVkSQpDcnJjTzhkUVp3S0JnSHA1MzRXVWNCMVZibzFlYStIMUxXWlFRUmxsTWlwRFM2TzBqeWZWSmtFb1BZSEJESnp2ClIrdzZLREJ4eFoyWmJsZ05LblV0YlhHSVFZd3lGelhNcFB5SGxNVHpiZkJhYmJLcDFyR2JVT2RCMXpXM09PRkgKcmppb21TUm1YNmxhaDk0SjRHU0lFZ0drNGw1SHhxZ3JGRDZ2UDd4NGRjUktJWFpLZ0w2dVJSSUpBb0dCQU1CVApaL2p5WStRNTBLdEtEZHUrYU9ORW4zaGxUN3hrNXRKN3NBek5rbWdGMU10RXlQUk9Xd1pQVGFJbWpRbk9qbHdpCldCZ2JGcXg0M2ZlQ1Z4ZXJ6V3ZEM0txaWJVbWpCTkNMTGtYeGh3ZEVteFQwVit2NzZGYzgwaTNNYVdSNnZZR08KditwVVovL0F6UXdJcWZ6dlVmV2ZxdStrMHlhVXhQOGNlcFBIRyt0bEFvR0FmQUtVVWhqeFU0Ym5vVzVwVUhKegpwWWZXZXZ5TW54NWZyT2VsSmRmNzlvNGMvMHhVSjh1eFBFWDFkRmNrZW96dHNpaVFTNkN6MENRY09XVWxtSkRwCnVrdERvVzM3VmNSQU1BVjY3NlgxQVZlM0UwNm5aL2g2Tkd4Z28rT042Q3pwL0lkMkJPUm9IMFAxa2RjY1NLT3kKMUtFZlNnb1B0c1N1eEpBZXdUZmxDMXc9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K

tests/longevity/manifests/cafe.yaml

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: coffee
5+
spec:
6+
replicas: 3
7+
selector:
8+
matchLabels:
9+
app: coffee
10+
template:
11+
metadata:
12+
labels:
13+
app: coffee
14+
spec:
15+
containers:
16+
- name: coffee
17+
image: nginxdemos/nginx-hello:plain-text
18+
ports:
19+
- containerPort: 8080
20+
readinessProbe:
21+
httpGet:
22+
path: /
23+
port: 8080
24+
lifecycle:
25+
preStop:
26+
exec:
27+
command: ["/bin/sleep", "15"]
28+
---
29+
apiVersion: v1
30+
kind: Service
31+
metadata:
32+
name: coffee
33+
spec:
34+
ports:
35+
- port: 80
36+
targetPort: 8080
37+
protocol: TCP
38+
name: http
39+
selector:
40+
app: coffee
41+
---
42+
apiVersion: apps/v1
43+
kind: Deployment
44+
metadata:
45+
name: tea
46+
spec:
47+
replicas: 3
48+
selector:
49+
matchLabels:
50+
app: tea
51+
template:
52+
metadata:
53+
labels:
54+
app: tea
55+
spec:
56+
containers:
57+
- name: tea
58+
image: nginxdemos/nginx-hello:plain-text
59+
ports:
60+
- containerPort: 8080
61+
readinessProbe:
62+
httpGet:
63+
path: /
64+
port: 8080
65+
lifecycle:
66+
preStop:
67+
exec:
68+
command: ["/bin/sleep", "15"]
69+
---
70+
apiVersion: v1
71+
kind: Service
72+
metadata:
73+
name: tea
74+
spec:
75+
ports:
76+
- port: 80
77+
targetPort: 8080
78+
protocol: TCP
79+
name: http
80+
selector:
81+
app: tea
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
apiVersion: v1
2+
kind: ServiceAccount
3+
metadata:
4+
name: rollout-mgr
5+
namespace: default
6+
---
7+
apiVersion: rbac.authorization.k8s.io/v1
8+
kind: Role
9+
metadata:
10+
name: rollout-mgr
11+
namespace: default
12+
rules:
13+
- apiGroups:
14+
- "apps"
15+
resources:
16+
- deployments
17+
verbs:
18+
- patch
19+
---
20+
apiVersion: rbac.authorization.k8s.io/v1
21+
kind: RoleBinding
22+
metadata:
23+
name: rollout-mgr
24+
namespace: default
25+
roleRef:
26+
apiGroup: rbac.authorization.k8s.io
27+
kind: Role
28+
name: rollout-mgr
29+
subjects:
30+
- kind: ServiceAccount
31+
name: rollout-mgr
32+
namespace: default
33+
---
34+
apiVersion: batch/v1
35+
kind: CronJob
36+
metadata:
37+
name: coffee-rollout-mgr
38+
namespace: default
39+
spec:
40+
schedule: "* */6 * * *" # every minute every 6 hours
41+
jobTemplate:
42+
spec:
43+
template:
44+
spec:
45+
serviceAccountName: rollout-mgr
46+
containers:
47+
- name: coffee-rollout-mgr
48+
image: curlimages/curl:8.3.0
49+
imagePullPolicy: IfNotPresent
50+
command:
51+
- /bin/sh
52+
- -c
53+
args:
54+
- |
55+
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
56+
RESTARTED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
57+
curl -X PATCH -s -k -v \
58+
-H "Authorization: Bearer $TOKEN" \
59+
-H "Content-type: application/merge-patch+json" \
60+
--data-raw "{\"spec\": {\"template\": {\"metadata\": {\"annotations\": {\"kubectl.kubernetes.io/restartedAt\": \"$RESTARTED_AT\"}}}}}" \
61+
"https://kubernetes/apis/apps/v1/namespaces/default/deployments/coffee?fieldManager=kubectl-rollout" 2>&1
62+
restartPolicy: OnFailure
63+
---
64+
apiVersion: batch/v1
65+
kind: CronJob
66+
metadata:
67+
name: tea-rollout-mgr
68+
namespace: default
69+
spec:
70+
schedule: "* 3,9,15,21 * * *" # every minute every 6 hours, 3 hours apart from coffee
71+
jobTemplate:
72+
spec:
73+
template:
74+
spec:
75+
serviceAccountName: rollout-mgr
76+
containers:
77+
- name: coffee-rollout-mgr
78+
image: curlimages/curl:8.3.0
79+
imagePullPolicy: IfNotPresent
80+
command:
81+
- /bin/sh
82+
- -c
83+
args:
84+
- |
85+
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
86+
RESTARTED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
87+
curl -X PATCH -s -k -v \
88+
-H "Authorization: Bearer $TOKEN" \
89+
-H "Content-type: application/merge-patch+json" \
90+
--data-raw "{\"spec\": {\"template\": {\"metadata\": {\"annotations\": {\"kubectl.kubernetes.io/restartedAt\": \"$RESTARTED_AT\"}}}}}" \
91+
"https://kubernetes/apis/apps/v1/namespaces/default/deployments/tea?fieldManager=kubectl-rollout" 2>&1
92+
restartPolicy: OnFailure

0 commit comments

Comments
 (0)