Skip to content

Commit feb1da3

Browse files
sbernauerJimvin
andauthored
feat: Support graceful shutdown (#407)
* feat: Support graceful shutdown * update docs * docs * changelog * link code in docs * increase default of datanodes to 30 min * move into constants * use new operator-rs * docs: Format 15 minutes * Use new operator-rs * improve docs * fix link * use operator-rs 0.55.0 * fixup * improve docs * set error context * Added a high level description of graceful shutdown * Revert "Added a high level description of graceful shutdown" This reverts commit 7733ec1. Moved to stackabletech/documentation#473 --------- Co-authored-by: Jim Halfpenny <jim@source321.com>
1 parent c120c0a commit feb1da3

File tree

9 files changed

+134
-10
lines changed

9 files changed

+134
-10
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ All notable changes to this project will be documented in this file.
99
- Default stackableVersion to operator version ([#381]).
1010
- Configuration overrides for the JVM security properties, such as DNS caching ([#384]).
1111
- Support PodDisruptionBudgets ([#394]).
12+
- Support graceful shutdown ([#407]).
1213
- Added support for 3.2.4, 3.3.6 ([#409]).
1314

1415
### Changed
@@ -33,6 +34,7 @@ All notable changes to this project will be documented in this file.
3334
[#402]: https://github.com/stackabletech/hdfs-operator/pull/402
3435
[#404]: https://github.com/stackabletech/hdfs-operator/pull/404
3536
[#405]: https://github.com/stackabletech/hdfs-operator/pull/405
37+
[#407]: https://github.com/stackabletech/hdfs-operator/pull/407
3638
[#409]: https://github.com/stackabletech/hdfs-operator/pull/409
3739

3840
## [23.7.0] - 2023-07-14

deploy/helm/hdfs-operator/crds/crds.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -576,6 +576,10 @@ spec:
576576
type: array
577577
type: object
578578
type: object
579+
gracefulShutdownTimeout:
580+
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
581+
nullable: true
582+
type: string
579583
logging:
580584
default:
581585
enableVectorAgent: null
@@ -4069,6 +4073,10 @@ spec:
40694073
type: array
40704074
type: object
40714075
type: object
4076+
gracefulShutdownTimeout:
4077+
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
4078+
nullable: true
4079+
type: string
40724080
logging:
40734081
default:
40744082
enableVectorAgent: null
@@ -7621,6 +7629,10 @@ spec:
76217629
type: array
76227630
type: object
76237631
type: object
7632+
gracefulShutdownTimeout:
7633+
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
7634+
nullable: true
7635+
type: string
76247636
logging:
76257637
default:
76267638
enableVectorAgent: null
@@ -11105,6 +11117,10 @@ spec:
1110511117
type: array
1110611118
type: object
1110711119
type: object
11120+
gracefulShutdownTimeout:
11121+
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
11122+
nullable: true
11123+
type: string
1110811124
logging:
1110911125
default:
1111011126
enableVectorAgent: null
@@ -14606,6 +14622,10 @@ spec:
1460614622
type: array
1460714623
type: object
1460814624
type: object
14625+
gracefulShutdownTimeout:
14626+
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
14627+
nullable: true
14628+
type: string
1460914629
logging:
1461014630
default:
1461114631
enableVectorAgent: null
@@ -18090,6 +18110,10 @@ spec:
1809018110
type: array
1809118111
type: object
1809218112
type: object
18113+
gracefulShutdownTimeout:
18114+
description: Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
18115+
nullable: true
18116+
type: string
1809318117
logging:
1809418118
default:
1809518119
enableVectorAgent: null
Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,36 @@
11
= Graceful shutdown
22

3-
Graceful shutdown of HDFS nodes is either not supported by the product itself
4-
or we have not implemented it yet.
3+
You can configure the graceful shutdown as described in xref:concepts:operations/graceful_shutdown.adoc[].
54

6-
Outstanding implementation work for the graceful shutdowns of all products where this functionality is relevant is tracked in https://github.com/stackabletech/issues/issues/357
5+
== JournalNodes
6+
7+
As a default, JournalNodes have `15 minutes` to terminate gracefully.
8+
9+
The JournalNode process will always run as PID `1` and will get a `SIGTERM` once Kubernetes wants to terminate the Pod.
10+
It will log the received signal as show in the log below and initiate a graceful shutdown.
11+
After the graceful shutdown timeout is passed and the process still didn't exit, Kubernetes will issue an `SIGKILL` to force-kill the process.
12+
13+
https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L2004[This] is the relevant code that gets executed in the JournalNodes as of HDFS version `3.3.4`.
14+
15+
[source,text]
16+
----
17+
2023-10-10 13:37:41,525 ERROR server.JournalNode (LogAdapter.java:error(75)) - RECEIVED SIGNAL 15: SIGTERM
18+
2023-10-10 13:37:41,526 INFO server.JournalNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG:
19+
/************************************************************
20+
SHUTDOWN_MSG: Shutting down JournalNode at hdfs-journalnode-default-0/10.244.0.38
21+
************************************************************/
22+
----
23+
24+
== NameNodes
25+
26+
As a default, NameNodes have `15 minutes` to terminate gracefully.
27+
They go through the same mechanism as documented for the <<_journalnodes>> above.
28+
29+
https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java#L1080[This] is the relevant code that gets executed in the NameNodes as of HDFS version `3.3.4`.
30+
31+
== DataNodes
32+
33+
As a default, DataNodes have `30 minutes` to terminate gracefully.
34+
They go through the same mechanism as documented for the <<_journalnodes>> above.
35+
36+
https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java#L272[This] is the relevant code that gets executed in the DataNodes as of HDFS version `3.3.4`.

rust/crd/src/constants.rs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
use stackable_operator::time::Duration;
2+
13
pub const DEFAULT_DFS_REPLICATION_FACTOR: u8 = 3;
24

35
pub const CONTROLLER_NAME: &str = "hdfsclusters.hdfs.stackable.tech";
@@ -41,6 +43,13 @@ pub const DEFAULT_JOURNAL_NODE_HTTP_PORT: u16 = 8480;
4143
pub const DEFAULT_JOURNAL_NODE_HTTPS_PORT: u16 = 8481;
4244
pub const DEFAULT_JOURNAL_NODE_RPC_PORT: u16 = 8485;
4345

46+
pub const DEFAULT_JOURNAL_NODE_GRACEFUL_SHUTDOWN_TIMEOUT: Duration =
47+
Duration::from_minutes_unchecked(15);
48+
pub const DEFAULT_NAME_NODE_GRACEFUL_SHUTDOWN_TIMEOUT: Duration =
49+
Duration::from_minutes_unchecked(15);
50+
pub const DEFAULT_DATA_NODE_GRACEFUL_SHUTDOWN_TIMEOUT: Duration =
51+
Duration::from_minutes_unchecked(30);
52+
4453
// hdfs-site.xml
4554
pub const DFS_NAMENODE_NAME_DIR: &str = "dfs.namenode.name.dir";
4655
pub const DFS_NAMENODE_SHARED_EDITS_DIR: &str = "dfs.namenode.shared.edits.dir";

rust/crd/src/lib.rs

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ use stackable_operator::{
3636
role_utils::{GenericRoleConfig, Role, RoleGroup, RoleGroupRef},
3737
schemars::{self, JsonSchema},
3838
status::condition::{ClusterCondition, HasStatusCondition},
39+
time::Duration,
3940
};
4041
use std::collections::{BTreeMap, HashMap};
4142
use storage::{
@@ -156,6 +157,7 @@ pub trait MergedConfig {
156157
None
157158
}
158159
fn affinity(&self) -> &StackableAffinity;
160+
fn graceful_shutdown_timeout(&self) -> Option<&Duration>;
159161
/// Main container shared by all roles
160162
fn hdfs_logging(&self) -> ContainerLogConfig;
161163
/// Vector container shared by all roles
@@ -841,6 +843,9 @@ pub struct NameNodeConfig {
841843
pub logging: Logging<NameNodeContainer>,
842844
#[fragment_attrs(serde(default))]
843845
pub affinity: StackableAffinity,
846+
#[fragment_attrs(serde(default))]
847+
/// Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
848+
pub graceful_shutdown_timeout: Option<Duration>,
844849
}
845850

846851
impl MergedConfig for NameNodeConfig {
@@ -852,6 +857,10 @@ impl MergedConfig for NameNodeConfig {
852857
&self.affinity
853858
}
854859

860+
fn graceful_shutdown_timeout(&self) -> Option<&Duration> {
861+
self.graceful_shutdown_timeout.as_ref()
862+
}
863+
855864
fn hdfs_logging(&self) -> ContainerLogConfig {
856865
self.logging
857866
.containers
@@ -916,6 +925,7 @@ impl NameNodeConfigFragment {
916925
},
917926
logging: product_logging::spec::default_logging(),
918927
affinity: get_affinity(cluster_name, role),
928+
graceful_shutdown_timeout: Some(DEFAULT_NAME_NODE_GRACEFUL_SHUTDOWN_TIMEOUT),
919929
}
920930
}
921931
}
@@ -1001,6 +1011,9 @@ pub struct DataNodeConfig {
10011011
pub logging: Logging<DataNodeContainer>,
10021012
#[fragment_attrs(serde(default))]
10031013
pub affinity: StackableAffinity,
1014+
#[fragment_attrs(serde(default))]
1015+
/// Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
1016+
pub graceful_shutdown_timeout: Option<Duration>,
10041017
}
10051018

10061019
impl MergedConfig for DataNodeConfig {
@@ -1014,6 +1027,10 @@ impl MergedConfig for DataNodeConfig {
10141027
&self.affinity
10151028
}
10161029

1030+
fn graceful_shutdown_timeout(&self) -> Option<&Duration> {
1031+
self.graceful_shutdown_timeout.as_ref()
1032+
}
1033+
10171034
fn hdfs_logging(&self) -> ContainerLogConfig {
10181035
self.logging
10191036
.containers
@@ -1069,6 +1086,7 @@ impl DataNodeConfigFragment {
10691086
},
10701087
logging: product_logging::spec::default_logging(),
10711088
affinity: get_affinity(cluster_name, role),
1089+
graceful_shutdown_timeout: Some(DEFAULT_DATA_NODE_GRACEFUL_SHUTDOWN_TIMEOUT),
10721090
}
10731091
}
10741092
}
@@ -1152,6 +1170,9 @@ pub struct JournalNodeConfig {
11521170
pub logging: Logging<JournalNodeContainer>,
11531171
#[fragment_attrs(serde(default))]
11541172
pub affinity: StackableAffinity,
1173+
#[fragment_attrs(serde(default))]
1174+
/// Time period Pods have to gracefully shut down, e.g. `30m`, `1h` or `2d`. Consult the operator documentation for details.
1175+
pub graceful_shutdown_timeout: Option<Duration>,
11551176
}
11561177

11571178
impl MergedConfig for JournalNodeConfig {
@@ -1163,6 +1184,10 @@ impl MergedConfig for JournalNodeConfig {
11631184
&self.affinity
11641185
}
11651186

1187+
fn graceful_shutdown_timeout(&self) -> Option<&Duration> {
1188+
self.graceful_shutdown_timeout.as_ref()
1189+
}
1190+
11661191
fn hdfs_logging(&self) -> ContainerLogConfig {
11671192
self.logging
11681193
.containers
@@ -1206,6 +1231,7 @@ impl JournalNodeConfigFragment {
12061231
},
12071232
logging: product_logging::spec::default_logging(),
12081233
affinity: get_affinity(cluster_name, role),
1234+
graceful_shutdown_timeout: Some(DEFAULT_JOURNAL_NODE_GRACEFUL_SHUTDOWN_TIMEOUT),
12091235
}
12101236
}
12111237
}

rust/operator/src/hdfs_controller.rs

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ use crate::{
66
discovery::build_discovery_configmap,
77
event::{build_invalid_replica_message, publish_event},
88
kerberos,
9-
operations::pdb::add_pdbs,
9+
operations::{self, graceful_shutdown::add_graceful_shutdown_config, pdb::add_pdbs},
1010
product_logging::{extend_role_group_config_map, resolve_vector_aggregator_address},
1111
OPERATOR_NAME,
1212
};
@@ -166,14 +166,15 @@ pub enum Error {
166166
"kerberos not supported for HDFS versions < 3.3.x. Please use at least version 3.3.x"
167167
))]
168168
KerberosNotSupported {},
169-
#[snafu(display(
170-
"failed to serialize [{JVM_SECURITY_PROPERTIES_FILE}] for {}",
171-
rolegroup
172-
))]
173-
JvmSecurityPoperties {
169+
#[snafu(display("failed to serialize [{JVM_SECURITY_PROPERTIES_FILE}] for {rolegroup}",))]
170+
JvmSecurityProperties {
174171
source: stackable_operator::product_config::writer::PropertiesWriterError,
175172
rolegroup: String,
176173
},
174+
#[snafu(display("failed to configure graceful shutdown"), context(false))]
175+
GracefulShutdown {
176+
source: operations::graceful_shutdown::Error,
177+
},
177178
}
178179

179180
impl ReconcilerError for Error {
@@ -599,7 +600,7 @@ fn rolegroup_config_map(
599600
.add_data(
600601
JVM_SECURITY_PROPERTIES_FILE,
601602
to_java_properties_string(jvm_sec_props.iter()).with_context(|_| {
602-
JvmSecurityPopertiesSnafu {
603+
JvmSecurityPropertiesSnafu {
603604
rolegroup: rolegroup_ref.role_group.clone(),
604605
}
605606
})?,
@@ -667,6 +668,8 @@ fn rolegroup_statefulset(
667668
)
668669
.context(FailedToCreateContainerAndVolumeConfigurationSnafu)?;
669670

671+
add_graceful_shutdown_config(merged_config, &mut pb)?;
672+
670673
let mut pod_template = pb.build_template();
671674
if let Some(pod_overrides) = hdfs.pod_overrides_for_role(role) {
672675
pod_template.merge_from(pod_overrides.clone());
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
use snafu::{ResultExt, Snafu};
2+
use stackable_hdfs_crd::MergedConfig;
3+
use stackable_operator::builder::PodBuilder;
4+
5+
#[derive(Debug, Snafu)]
6+
pub enum Error {
7+
#[snafu(display("Failed to set terminationGracePeriod"))]
8+
SetTerminationGracePeriod {
9+
source: stackable_operator::builder::pod::Error,
10+
},
11+
}
12+
13+
pub fn add_graceful_shutdown_config(
14+
merged_config: &(dyn MergedConfig + Send + 'static),
15+
pod_builder: &mut PodBuilder,
16+
) -> Result<(), Error> {
17+
// This must be always set by the merge mechanism, as we provide a default value,
18+
// users can not disable graceful shutdown.
19+
if let Some(graceful_shutdown_timeout) = merged_config.graceful_shutdown_timeout() {
20+
pod_builder
21+
.termination_grace_period(graceful_shutdown_timeout)
22+
.context(SetTerminationGracePeriodSnafu)?;
23+
}
24+
25+
Ok(())
26+
}

rust/operator/src/operations/mod.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1+
pub mod graceful_shutdown;
12
pub mod pdb;

tests/templates/kuttl/smoke/30-assert.yaml.j2

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ spec:
2323
- name: vector
2424
{% endif %}
2525
- name: zkfc
26+
terminationGracePeriodSeconds: 900
2627
status:
2728
readyReplicas: 2
2829
replicas: 2
@@ -46,6 +47,7 @@ spec:
4647
{% if lookup('env', 'VECTOR_AGGREGATOR') %}
4748
- name: vector
4849
{% endif %}
50+
terminationGracePeriodSeconds: 900
4951
status:
5052
readyReplicas: 1
5153
replicas: 1
@@ -69,6 +71,7 @@ spec:
6971
{% if lookup('env', 'VECTOR_AGGREGATOR') %}
7072
- name: vector
7173
{% endif %}
74+
terminationGracePeriodSeconds: 1800
7275
status:
7376
readyReplicas: {{ test_scenario['values']['number-of-datanodes'] }}
7477
replicas: {{ test_scenario['values']['number-of-datanodes'] }}

0 commit comments

Comments
 (0)