Skip to content

Commit a5bfdbd

Browse files
authored
Add manual Graceful recovery results for 1.3.0 (#2102)
1 parent 12992e8 commit a5bfdbd

File tree

2 files changed

+74
-91
lines changed

2 files changed

+74
-91
lines changed

tests/graceful-recovery/graceful-recovery.md

Lines changed: 18 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@ This document describes how we test graceful recovery from restarts on NGF.
99
- [Steps](#steps)
1010
- [Setup](#setup)
1111
- [Run the tests](#run-the-tests)
12-
- [Restart nginx-gateway container](#restart-nginx-gateway-container)
13-
- [Restart NGINX container](#restart-nginx-container)
1412
- [Restart Node with draining](#restart-node-with-draining)
1513
- [Restart Node without draining](#restart-node-without-draining)
1614
<!-- TOC -->
@@ -21,139 +19,68 @@ Ensure that NGF can recover gracefully from container failures without any user
2119

2220
## Test Environment
2321

24-
- A Kubernetes cluster with 3 nodes on GKE
25-
- Node: e2-medium (2 vCPU, 4GB memory)
2622
- A Kind cluster
2723

2824
## Steps
2925

3026
### Setup
3127

32-
1. Setup GKE Cluster.
33-
2. Clone the repo and change into the nginx-gateway-fabric directory.
34-
3. Check out the latest tag (unless you are installing the edge version from the main branch).
35-
4. Go into `deploy/manifests/nginx-gateway.yaml` and change the following:
28+
1. Deploy a one-Node Kind cluster. Can run `make create-kind-cluster` from main directory.
29+
30+
2. Go into `deploy/manifests/nginx-gateway.yaml` and change the following:
3631

3732
- `runAsNonRoot` from `true` to `false`: this allows us to insert our ephemeral container as root which enables us to restart the nginx-gateway container.
3833
- Add the `--product-telemetry-disable` argument to the nginx-gateway container args.
3934

40-
5. Follow the [installation instructions](https://github.com/nginxinc/nginx-gateway-fabric/blob/main/site/content/installation/installing-ngf/manifests.md)
41-
to deploy NGINX Gateway Fabric using manifests and expose it through a LoadBalancer Service.
42-
6. In a separate terminal track NGF logs.
35+
3. Follow [this guide](https://docs.nginx.com/nginx-gateway-fabric/installation/running-on-kind/) to deploy NGINX Gateway Fabric using manifests and expose it through a NodePort Service.
36+
37+
4. In a separate terminal track NGF logs.
4338

4439
```console
4540
kubectl -n nginx-gateway logs -f deploy/nginx-gateway -c nginx-gateway
4641
```
4742

48-
7. In a separate terminal track NGINX container logs.
43+
5. In a separate terminal track NGINX container logs.
4944

5045
```console
5146
kubectl -n nginx-gateway logs -f deploy/nginx-gateway -c nginx
5247
```
5348

54-
8. In a separate terminal Exec into the NGINX container inside the NGF pod.
49+
6. In a separate terminal Exec into the NGINX container inside the NGF pod.
5550

5651
```console
57-
kubectl exec -it -n nginx-gateway <NGF_POD> --container nginx -- sh
52+
kubectl exec -it -n nginx-gateway $(kubectl get pods -n nginx-gateway | sed -n '2s/^\([^[:space:]]*\).*$/\1/p') --container nginx -- sh
5853
```
5954

60-
9. In a different terminal, deploy the
55+
7. In a different terminal, deploy the
6156
[https-termination example](https://github.com/nginxinc/nginx-gateway-fabric/tree/main/examples/https-termination).
62-
10. Send traffic through the example application and ensure it is working correctly.
57+
8. Send traffic through the example application and ensure it is working correctly.
6358

6459
### Run the tests
6560

66-
#### Restart nginx-gateway container
67-
68-
1. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
69-
2. Insert ephemeral container in NGF Pod.
70-
71-
```console
72-
kubectl debug -it -n nginx-gateway <NGF_POD> --image=busybox:1.28 --target=nginx-gateway
73-
```
74-
75-
3. Kill nginx-gateway process through a SIGKILL signal (Process command should start with `/usr/bin/gateway`).
76-
77-
```console
78-
kill -9 <nginx-gateway_PID>
79-
```
80-
81-
4. Check for errors in the NGF and NGINX container logs.
82-
5. When the nginx-gateway container is back up, ensure traffic flows through the example application correctly.
83-
6. Open up the NGF and NGINX container logs and check for errors.
84-
7. Send traffic through the example application and ensure it is working correctly.
85-
8. Check that NGF can still process changes of resources.
86-
1. Delete the HTTPRoute resources.
87-
88-
```console
89-
kubectl delete -f ../../examples/https-termination/cafe-routes.yaml
90-
```
91-
92-
2. Send traffic through the example application using the updated resources and ensure traffic does not flow.
93-
3. Apply the HTTPRoute resources.
94-
95-
```console
96-
kubectl apply -f ../../examples/https-termination/cafe-routes.yaml
97-
```
98-
99-
4. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
100-
101-
#### Restart NGINX container
102-
103-
1. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
104-
2. If the terminal inside the NGINX container is no longer running, Exec back into the NGINX container.
105-
3. Inside the NGINX container, kill the nginx-master process through a SIGKILL signal
106-
(Process command should start with `nginx: master process`).
107-
108-
```console
109-
kill -9 <nginx-master_PID>
110-
```
111-
112-
4. When NGINX container is back up, ensure traffic flows through the example application correctly.
113-
5. Open up the NGINX container logs and check for errors.
114-
6. Check that NGF can still process changes of resources.
115-
1. Delete the HTTPRoute resources.
116-
117-
```console
118-
kubectl delete -f ../../examples/https-termination/cafe-routes.yaml
119-
```
120-
121-
2. Send traffic through the example application using the updated resources and ensure traffic does not flow.
122-
3. Apply the HTTPRoute resources.
123-
124-
```console
125-
kubectl apply -f ../../examples/https-termination/cafe-routes.yaml
126-
```
127-
128-
4. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
129-
13061
#### Restart Node with draining
13162

132-
1. Switch over to a one-Node Kind cluster. Can run `make create-kind-cluster` from main directory.
133-
2. Run steps 4-11 of the [Setup](#setup) section above using
134-
[this guide](https://docs.nginx.com/nginx-gateway-fabric/installation/running-on-kind/) for running on Kind.
135-
3. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
136-
4. Drain the Node of its resources.
63+
1. Drain the Node of its resources.
13764

13865
```console
13966
kubectl drain kind-control-plane --ignore-daemonsets --delete-local-data
14067
```
14168

142-
5. Delete the Node.
69+
2. Delete the Node.
14370

14471
```console
14572
kubectl delete node kind-control-plane
14673
```
14774

148-
6. Restart the Docker container.
75+
3. Restart the Docker container.
14976

15077
```console
15178
docker restart kind-control-plane
15279
```
15380

154-
7. Check the logs of the old and new NGF and NGINX containers for errors.
155-
8. Send traffic through the example application and ensure it is working correctly.
156-
9. Check that NGF can still process changes of resources.
81+
4. Check the logs of the old and new NGF and NGINX containers for errors.
82+
5. Send traffic through the example application and ensure it is working correctly.
83+
6. Check that NGF can still process changes of resources.
15784
1. Delete the HTTPRoute resources.
15885

15986
```console
@@ -171,4 +98,4 @@ to deploy NGINX Gateway Fabric using manifests and expose it through a LoadBalan
17198

17299
#### Restart Node without draining
173100

174-
1. Repeat the above test but remove steps 4-5 which include draining and deleting the Node.
101+
1. Repeat the above test but remove steps 1-2 which include draining and deleting the Node.
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Results for v1.3.0
2+
3+
<!-- TOC -->
4+
- [Results for v1.3.0](#results-for-v130)
5+
- [Summary](#summary)
6+
- [Versions](#versions)
7+
- [Tests](#tests)
8+
- [Restart Node with draining](#restart-node-with-draining)
9+
- [Restart Node without draining](#restart-node-without-draining)
10+
- [Future Improvements](#future-improvements)
11+
<!-- TOC -->
12+
13+
14+
## Summary
15+
16+
- No new issues since 1.1.
17+
- Known issue https://github.com/nginxinc/nginx-gateway-fabric/issues/1108 still exists.
18+
19+
## Versions
20+
21+
NGF version:
22+
23+
24+
```text
25+
"version":"edge"
26+
"commit":"c5f8dbe112ca1be261f73b9f5b4925cda3d5860a"
27+
"date":"2024-06-06T04:07:01Z"
28+
```
29+
30+
with NGINX:
31+
32+
```text
33+
nginx/1.27.0
34+
built by gcc 13.2.1 20231014 (Alpine 13.2.1_git20231014)
35+
OS: Linux 6.6.26-linuxkit
36+
```
37+
38+
Kubernetes:
39+
40+
```text
41+
v1.30.0
42+
```
43+
44+
## Tests
45+
46+
### Restart Node with draining
47+
48+
No errors.
49+
50+
### Restart Node without draining
51+
52+
Same issue as 1.1 where NGF is unable to recover: https://github.com/nginxinc/nginx-gateway-fabric/issues/1108
53+
54+
## Future Improvements
55+
56+
- None

0 commit comments

Comments
 (0)