@@ -5,26 +5,29 @@ This document describes how we test graceful recovery from restarts on NGF.
5
5
<!-- TOC -->
6
6
- [ Graceful recovery from restarts] ( #graceful-recovery-from-restarts )
7
7
- [ Goal] ( #goal )
8
- - [ Cluster Details] ( #cluster-details )
9
- - [ Setup] ( #setup )
10
- - [ Tests] ( #tests )
11
- - [ Restart nginx-gateway container] ( #restart-nginx-gateway-container )
12
- - [ Restart NGINX container] ( #restart-nginx-container )
13
- - [ Restart Node with draining] ( #restart-node-with-draining )
14
- - [ Restart Node without draining] ( #restart-node-without-draining )
8
+ - [ Test Environment] ( #test-environment )
9
+ - [ Steps] ( #steps )
10
+ - [ Setup] ( #setup )
11
+ - [ Run the tests] ( #run-the-tests )
12
+ - [ Restart nginx-gateway container] ( #restart-nginx-gateway-container )
13
+ - [ Restart NGINX container] ( #restart-nginx-container )
14
+ - [ Restart Node with draining] ( #restart-node-with-draining )
15
+ - [ Restart Node without draining] ( #restart-node-without-draining )
15
16
<!-- TOC -->
16
17
17
18
## Goal
19
+
18
20
Ensure that NGF can recover gracefully from container failures without any user intervention.
19
21
20
- ## Cluster Details
22
+ ## Test Environment
23
+
24
+ - A Kubernetes cluster with 3 nodes on GKE
25
+ - Node: e2-medium (2 vCPU, 4GB memory)
26
+ - A Kind cluster
21
27
22
- - GKE 1.27.3-gke.100
23
- - us-central1-c
24
- - Machine type of node is e2-medium
25
- - 3 nodes
28
+ ## Steps
26
29
27
- ## Setup
30
+ ### Setup
28
31
29
32
1 . Setup GKE Cluster.
30
33
2 . Clone the repo and change into the nginx-gateway-fabric directory.
@@ -57,18 +60,18 @@ to deploy NGINX Gateway Fabric using manifests and expose it through a LoadBalan
57
60
if the configuration and version were correctly updated.
58
61
11. Send traffic through the example application and ensure it is working correctly.
59
62
60
- ## Tests
63
+ ### Run the tests
61
64
62
- ### Restart nginx-gateway container
65
+ #### Restart nginx-gateway container
63
66
64
67
1. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
65
- 2. Insert ephemeral container in NGF Pod
68
+ 2. Insert ephemeral container in NGF Pod.
66
69
67
70
```console
68
71
kubectl debug -it -n nginx-gateway <NGF_POD> --image=busybox:1.28 --target=nginx-gateway
69
72
```
70
73
71
- 3. Kill nginx-gateway process through SIGKILL (Command should start with `/usr/bin/gateway`)
74
+ 3. Kill nginx-gateway process through a SIGKILL signal (Process command should start with `/usr/bin/gateway`).
72
75
73
76
```console
74
77
kill -9 <nginx-gateway_PID>
@@ -80,30 +83,29 @@ if the configuration and version were correctly updated.
80
83
7. Inside the NGINX container, check that `http.conf` was not changed and `config-version.conf` had its version set to `2`.
81
84
8. Send traffic through the example application and ensure it is working correctly.
82
85
9. Check that NGF can still process changes of resources.
83
- 1. Delete the HTTPRoute resources
86
+ 1. Delete the HTTPRoute resources.
84
87
85
88
```console
86
89
kubectl delete -f ../../examples/https-termination/cafe-routes.yaml
87
90
```
88
91
89
- 2. Inside the terminal which is inside the NGINX container, check that `http.conf` and
90
- `config-version.conf` were correctly updated.
92
+ 2. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
91
93
3. Send traffic through the example application using the updated resources and ensure traffic does not flow.
92
- 4. Apply the HTTPRoute resources
94
+ 4. Apply the HTTPRoute resources.
93
95
94
96
```console
95
97
kubectl apply -f ../../examples/https-termination/cafe-routes.yaml
96
98
```
97
99
98
- 5. Inside the terminal which is inside the NGINX container, check that `http.conf` and
99
- `config-version.conf` were correctly updated.
100
+ 5. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
100
101
6. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
101
102
102
- ### Restart NGINX container
103
+ #### Restart NGINX container
103
104
104
105
1. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
105
106
2. If the terminal inside the NGINX container is no longer running, Exec back into the NGINX container.
106
- 3. Inside the NGINX container, kill the nginx-master process through SIGKILL (Command should start with `nginx: master process`).
107
+ 3. Inside the NGINX container, kill the nginx-master process through a SIGKILL signal
108
+ (Process command should start with `nginx: master process`).
107
109
108
110
```console
109
111
kill -9 <nginx-master_PID>
@@ -113,44 +115,42 @@ if the configuration and version were correctly updated.
113
115
5. Open up the NGINX container logs and check for errors.
114
116
6. Exec back into the NGINX container and check that `http.conf` and `config-version.conf` were not changed.
115
117
7. Check that NGF can still process changes of resources.
116
- 1. Delete the HTTPRoute resources
118
+ 1. Delete the HTTPRoute resources.
117
119
118
120
```console
119
121
kubectl delete -f ../../examples/https-termination/cafe-routes.yaml
120
122
```
121
123
122
- 2. Inside the terminal which is inside the NGINX container, check that `http.conf` and
123
- `config-version.conf` were correctly updated.
124
+ 2. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
124
125
3. Send traffic through the example application using the updated resources and ensure traffic does not flow.
125
- 4. Apply the HTTPRoute resources
126
+ 4. Apply the HTTPRoute resources.
126
127
127
128
```console
128
129
kubectl apply -f ../../examples/https-termination/cafe-routes.yaml
129
130
```
130
131
131
- 5. Inside the terminal which is inside the NGINX container, check that `http.conf` and
132
- `config-version.conf` were correctly updated.
132
+ 5. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
133
133
6. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
134
134
135
- ### Restart Node with draining
135
+ #### Restart Node with draining
136
136
137
137
1. Switch over to a one-Node Kind cluster. Can run `make create-kind-cluster` from main directory.
138
- 2. Run steps 4-12 of the Setup section above using [this guide]
139
- (https://github.com/nginxinc/nginx-gateway-fabric/blob/main/docs/running-on-kind.md) for running on Kind.
138
+ 2. Run steps 4-11 of the [ Setup](#setup) section above using
139
+ [this guide] (https://github.com/nginxinc/nginx-gateway-fabric/blob/main/docs/running-on-kind.md) for running on Kind.
140
140
3. Ensure NGF and NGINX container logs are set up and traffic flows through the example application correctly.
141
- 4. Drain the Node of its resources
141
+ 4. Drain the Node of its resources.
142
142
143
143
```console
144
144
kubectl drain kind-control-plane --ignore-daemonsets --delete-local-data
145
145
```
146
146
147
- 5. Delete the Node
147
+ 5. Delete the Node.
148
148
149
149
```console
150
150
kubectl delete node kind-control-plane
151
151
```
152
152
153
- 6. Restart the Docker container
153
+ 6. Restart the Docker container.
154
154
155
155
```console
156
156
docker restart kind-control-plane
@@ -160,25 +160,23 @@ if the configuration and version were correctly updated.
160
160
8. Exec back into the NGINX container and check that `http.conf` and `config-version.conf` were not changed.
161
161
9. Send traffic through the example application and ensure it is working correctly.
162
162
10. Check that NGF can still process changes of resources.
163
- 1. Delete the HTTPRoute resources
163
+ 1. Delete the HTTPRoute resources.
164
164
165
165
```console
166
166
kubectl delete -f ../../examples/https-termination/cafe-routes.yaml
167
167
```
168
168
169
- 2. Inside the terminal which is inside the NGINX container, check that `http.conf` and
170
- `config-version.conf` were correctly updated.
169
+ 2. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
171
170
3. Send traffic through the example application using the updated resources and ensure traffic does not flow.
172
- 4. Apply the HTTPRoute resources
171
+ 4. Apply the HTTPRoute resources.
173
172
174
173
```console
175
174
kubectl apply -f ../../examples/https-termination/cafe-routes.yaml
176
175
```
177
176
178
- 5. Inside the terminal which is inside the NGINX container, check that `http.conf` and
179
- `config-version.conf` were correctly updated.
177
+ 5. Inside the NGINX container, check that `http.conf` and `config-version.conf` were correctly updated.
180
178
6. Send traffic through the example application using the updated resources and ensure traffic flows correctly.
181
179
182
- ### Restart Node without draining
180
+ #### Restart Node without draining
183
181
184
182
1. Repeat the above test but remove steps 4-5 which include draining and deleting the Node.
0 commit comments