Skip to content

Prepare shared code: Support running on non default dns settings #893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
ab8e419
wip
maltesander Oct 15, 2024
e972aee
rework - wip
maltesander Oct 16, 2024
96ff314
docs & clippy fix
maltesander Oct 16, 2024
c699b9c
fix tests
maltesander Oct 16, 2024
68f9d6a
adapted changelog
maltesander Oct 16, 2024
dd2d672
fix changelog entry
maltesander Oct 16, 2024
084e1d2
add missing log message
maltesander Oct 16, 2024
06b5aac
improve logging messages
maltesander Oct 16, 2024
b0d3de4
Merge remote-tracking branch 'origin/main' into support-running-on-no…
maltesander Oct 16, 2024
e053479
mark changelog entry as breaking
maltesander Oct 16, 2024
3d5c01e
fix whitespaces
maltesander Oct 16, 2024
53d8c9f
fix tests
maltesander Oct 16, 2024
32e41d8
refactor: Adjust cluster domain parsing code
Techassi Oct 16, 2024
f7537c7
chore: Merge branch 'main' into support-running-on-non-default-dns-se…
Techassi Oct 17, 2024
d3bd713
chore: Adjust error variants
Techassi Oct 17, 2024
5da95d7
chore: Update dev comment
Techassi Oct 17, 2024
c4b22b5
chore: Adjust doc comment
Techassi Oct 17, 2024
ab93d4e
chore: Adjust changelog
Techassi Oct 17, 2024
abf10f9
chore: Add no_run to example
Techassi Oct 17, 2024
cc570fe
Update crates/stackable-operator/src/utils/cluster_domain.rs
maltesander Oct 17, 2024
954bbbc
improve tracing messages
maltesander Oct 17, 2024
db8a7c9
add note
maltesander Oct 17, 2024
da290fa
replace expect with error handling
maltesander Oct 17, 2024
59ca78b
clippy
maltesander Oct 17, 2024
23697ae
refactor: Adjust tracing
sbernauer Oct 17, 2024
816b9ef
Apply suggestions from code review
maltesander Oct 17, 2024
c90cee8
Remove re-export
sbernauer Oct 17, 2024
d0ab020
fix docs
maltesander Oct 17, 2024
c686568
chore: Apply suggestions from code review
NickLarsenNZ Oct 17, 2024
e9afa5a
chore: Apply suggestions from code review
NickLarsenNZ Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,5 +51,5 @@ repos:
name: .scripts/verify-crate-versions
language: system
entry: .scripts/verify_crate_versions.sh
stages: [commit, merge-commit, manual]
stages: [pre-commit, pre-merge-commit, manual]
pass_filenames: false
14 changes: 11 additions & 3 deletions crates/stackable-operator/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,26 @@ All notable changes to this project will be documented in this file.

- Re-export the `YamlSchema` trait and the `stackable-shared` crate as the `shared` module ([#883]).
- BREAKING: Added `preferredAddressType` field to ListenerClass CRD ([#885]).
- BREAKING: The cluster domain (default: `cluster.local`) can now be configured in the individual
operators via the ENV variable `KUBERNETES_CLUSTER_DOMAIN` or resolved automatically by parsing
the `/etc/resolve.conf` file. This requires using `initialize_operator` instead of `create_client`
in the `main.rs` of the individual operators ([#893]).

### Changed

- BREAKING: The `CustomResourceExt` trait is now re-exported from the `stackable-shared` crate. The
trait functions use the same parameters but return a different error type ([#883]).
- BREAKING: `KeyValuePairs` (as well as `Labels`/`Annotations` via it) is now backed by a `BTreeMap` rather than a `BTreeSet` ([#888]).
- BREAKING: `KeyValuePairs` (as well as `Labels`/`Annotations` via it) is now backed by a `BTreeMap`
rather than a `BTreeSet` ([#888]).
- The `Deref` impl now returns a `BTreeMap` instead.
- `iter()` now clones the values.

### Fixed

- BREAKING: `KeyValuePairs::insert` (as well as `Labels::`/`Annotations::` via it) now overwrites the old value if the key already exists ([#888]).
- Previously, `iter()` would return *both* values in lexicographical order (causing further conversions like `Into<BTreeMap>` to prefer the maximum value).
- BREAKING: `KeyValuePairs::insert` (as well as `Labels::`/`Annotations::` via it) now overwrites
the old value if the key already exists. Previously, `iter()` would return *both* values in
lexicographical order (causing further conversions like `Into<BTreeMap>` to prefer the maximum
value) ([#888]).

### Removed

Expand All @@ -31,6 +38,7 @@ All notable changes to this project will be documented in this file.
[#883]: https://github.com/stackabletech/operator-rs/pull/883
[#885]: https://github.com/stackabletech/operator-rs/pull/885
[#888]: https://github.com/stackabletech/operator-rs/pull/888
[#893]: https://github.com/stackabletech/operator-rs/pull/893

## [0.78.0] - 2024-09-30

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
nameserver 10.243.21.53
options ndots:5
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
search baz svc.foo.bar foo.bar
search sble-operators.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.243.21.53
options ndots:5
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
search sble-operators.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.243.21.53
options ndots:5
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
search openshift-service-ca-operator.svc.cluster.local svc.cluster.local cluster.local cmx.repl-openshift.build
nameserver 172.30.0.10
options ndots:5
16 changes: 13 additions & 3 deletions crates/stackable-operator/src/client.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use crate::kvp::LabelSelectorExt;
use crate::utils::cluster_domain::{self, retrieve_cluster_domain, KUBERNETES_CLUSTER_DOMAIN};

use either::Either;
use futures::StreamExt;
Expand Down Expand Up @@ -77,6 +78,9 @@ pub enum Error {

#[snafu(display("unable to create kubernetes client"))]
CreateKubeClient { source: kube::Error },

#[snafu(display("unable to to resolve kubernetes cluster domain"))]
ResolveKubernetesClusterDomain { source: cluster_domain::Error },
}

/// This `Client` can be used to access Kubernetes.
Expand Down Expand Up @@ -510,12 +514,12 @@ impl Client {
/// use tokio::time::error::Elapsed;
/// use kube::runtime::watcher;
/// use k8s_openapi::api::core::v1::Pod;
/// use stackable_operator::client::{Client, create_client};
/// use stackable_operator::client::{Client, initialize_operator};
///
/// #[tokio::main]
/// async fn main(){
///
/// let client: Client = create_client(None).await.expect("Unable to construct client.");
/// let client: Client = initialize_operator(None).await.expect("Unable to construct client.");
/// let watcher_config: watcher::Config =
/// watcher::Config::default().fields(&format!("metadata.name=nonexistent-pod"));
///
Expand Down Expand Up @@ -622,7 +626,13 @@ where
}
}

pub async fn create_client(field_manager: Option<String>) -> Result<Client> {
pub async fn initialize_operator(field_manager: Option<String>) -> Result<Client> {
let _ = KUBERNETES_CLUSTER_DOMAIN
.set(retrieve_cluster_domain().context(ResolveKubernetesClusterDomainSnafu)?);
create_client(field_manager).await
}
Comment on lines +630 to +633
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a global?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its supposed to be the new "entry point" for operators. There will be more content in there in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically renaming create_client to initialize_operator to be more expressive. By doing this we enforce that operators need to go throw this function and therefore initialize the OnceLock

Copy link
Member

@nightkr nightkr Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically renaming create_client to initialize_operator to be more expressive.

But it.. isn't? It says less about what's happening, nor does it explain why this place would be the entry point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be more content in there in the future.

Such as?

By doing this we enforce that operators need to go throw this function and therefore initialize the OnceLock

This still doesn't explain why it needs to be a OnceLock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such as?

I have branch where I warn on an unsupported Kubernetes version. It's just not merged yet because of prioritization. In the future it may also read in the recommended product version or some global config object or whatnot or check if there is an update available.

This still doesn't explain why it needs to be a OnceLock.

Agreed, I didn't try do explain that. Do you have an alternative suggestion how we can make this easy-to-use in operators? It looks very intuitive to me, but I'm not a Rust expert!
In the beginning we had a LazyLock, but @Techassi mentioned that we would panic on e.g. an invalid DomainName being configured, and that the operators should error instead of panic, so we needed to change to a OnceLock.

Copy link
Member

@nightkr nightkr Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have branch where I warn on an unsupported Kubernetes version.

That still sounds like a part of.. creating a k8s client.

Agreed, I didn't try do explain that. Do you have an alternative suggestion how we can make this easy-to-use in operators? It looks very intuitive to me, but I'm not a Rust expert!

I would have said you could store it in the Client, but on second thought maybe it would make more sense to create a separate KubernetesClusterConfig struct.

LazyLock/OnceLock is for caching expensive static values that can't be const for whatever reason (like a precomputed HashMap), not for configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We replaced the LacyLock with a KubernetesClusterInfo struct in #896


async fn create_client(field_manager: Option<String>) -> Result<Client> {
let kubeconfig: Config = kube::Config::infer()
.await
.map_err(kube::Error::InferConfig)
Expand Down
8 changes: 8 additions & 0 deletions crates/stackable-operator/src/commons/networking.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@ impl TryFrom<String> for DomainName {
}
}

impl TryFrom<&str> for DomainName {
type Error = validation::Errors;

fn try_from(value: &str) -> Result<Self, Self::Error> {
value.parse()
}
}

impl From<DomainName> for String {
fn from(value: DomainName) -> Self {
value.0
Expand Down
187 changes: 187 additions & 0 deletions crates/stackable-operator/src/utils/cluster_domain.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
use std::{env, path::PathBuf, str::FromStr, sync::OnceLock};

use snafu::{ResultExt, Snafu};
use tracing::instrument;

use crate::commons::networking::DomainName;

const KUBERNETES_CLUSTER_DOMAIN_ENV: &str = "KUBERNETES_CLUSTER_DOMAIN";
const KUBERNETES_SERVICE_HOST_ENV: &str = "KUBERNETES_SERVICE_HOST";

const KUBERNETES_CLUSTER_DOMAIN_DEFAULT: &str = "cluster.local";
const RESOLVE_CONF_FILE_PATH: &str = "/etc/resolv.conf";

#[derive(Debug, Snafu)]
pub enum Error {
#[snafu(display("failed to read resolv config from {RESOLVE_CONF_FILE_PATH:?}"))]
ReadResolvConfFile { source: std::io::Error },

#[snafu(display("failed to parse {cluster_domain:?} as domain name"))]
ParseDomainName {
source: crate::validation::Errors,
cluster_domain: String,
},

#[snafu(display(r#"unable to find "search" entry"#))]
NoSearchEntry,

#[snafu(display(r#"unable to find unambiguous domain in "search" entry"#))]
AmbiguousDomainEntries,
}

/// Tries to retrieve the Kubernetes cluster domain.
///
/// 1. Return `KUBERNETES_CLUSTER_DOMAIN` if set, otherwise
/// 2. Return the cluster domain parsed from the `/etc/resolv.conf` file if `KUBERNETES_SERVICE_HOST`
/// is set, otherwise fall back to `cluster.local`.
///
/// This variable is initialized in [`crate::client::initialize_operator`], which is called in the
/// main function. It can be used as suggested below.
///
/// ## Usage
///
/// ```no_run
/// use stackable_operator::utils::cluster_domain::KUBERNETES_CLUSTER_DOMAIN;
///
/// let kubernetes_cluster_domain = KUBERNETES_CLUSTER_DOMAIN.get()
/// .expect("KUBERNETES_CLUSTER_DOMAIN must first be set by calling initialize_operator");
///
/// tracing::info!(%kubernetes_cluster_domain, "Found cluster domain");
/// ```
///
/// ## See
///
/// - <https://github.com/stackabletech/issues/issues/436>
/// - <https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/>
pub static KUBERNETES_CLUSTER_DOMAIN: OnceLock<DomainName> = OnceLock::new();

#[instrument]
pub(crate) fn retrieve_cluster_domain() -> Result<DomainName, Error> {
// 1. Read KUBERNETES_CLUSTER_DOMAIN env var
tracing::debug!("Trying to determine the Kubernetes cluster domain...");

match env::var(KUBERNETES_CLUSTER_DOMAIN_ENV) {
Ok(cluster_domain) if !cluster_domain.is_empty() => {
let cluster_domain = DomainName::from_str(&cluster_domain)
.context(ParseDomainNameSnafu { cluster_domain })?;
tracing::info!(
%cluster_domain,
"Using Kubernetes cluster domain from {KUBERNETES_CLUSTER_DOMAIN_ENV:?} environment variable"
);
return Ok(cluster_domain);
}
_ => {}
};

// 2. If no env var is set, check if we run in a clustered (Kubernetes/Openshift) environment
// by checking if KUBERNETES_SERVICE_HOST is set: If not default to 'cluster.local'.
tracing::debug!(
"Trying to determine the operator runtime environment as environment variable \
{KUBERNETES_CLUSTER_DOMAIN_ENV:?} is not set"
);

match env::var(KUBERNETES_SERVICE_HOST_ENV) {
Ok(_) => {
let cluster_domain = retrieve_cluster_domain_from_resolv_conf(RESOLVE_CONF_FILE_PATH)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to do this hack (and I still don't think we should), can we please at least punt that until vNext instead of sneaking it in just before the release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have many customers that require this, i think there is no way to punt on this for now, but calls for intensive testing for sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can punt on autodetection while still keeping the override.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We removed auto-detection in #896 again

let cluster_domain = DomainName::from_str(&cluster_domain)
.context(ParseDomainNameSnafu { cluster_domain })?;

tracing::info!(
%cluster_domain,
"Using Kubernetes cluster domain from {RESOLVE_CONF_FILE_PATH:?} file"
);

Ok(cluster_domain)
}
Err(_) => {
let cluster_domain = DomainName::from_str(KUBERNETES_CLUSTER_DOMAIN_DEFAULT)
.expect("KUBERNETES_CLUSTER_DOMAIN_DEFAULT constant must a valid domain");

tracing::info!(
%cluster_domain,
"Could not determine Kubernetes cluster domain as the operator is not running within Kubernetes, assuming default Kubernetes cluster domain"
);

Ok(cluster_domain)
}
}
}

#[instrument]
fn retrieve_cluster_domain_from_resolv_conf(
path: impl Into<PathBuf> + std::fmt::Debug,
) -> Result<String, Error> {
let path = path.into();
let content = std::fs::read_to_string(&path)
.inspect_err(|error| {
tracing::error!(%error, path = %path.display(), "Cannot read resolv conf");
})
.context(ReadResolvConfFileSnafu)?;

// If there are multiple search directives, only the search
// man 5 resolv.conf
let Some(last_search_entry) = content
.lines()
.rev()
.map(|l| l.trim())
.find(|&l| l.starts_with("search"))
.map(|l| l.trim_start_matches("search").trim())
else {
return NoSearchEntrySnafu.fail();
};

let Some(shortest_entry) = last_search_entry
.split_ascii_whitespace()
.min_by_key(|item| item.len())
else {
return AmbiguousDomainEntriesSnafu.fail();
};

// NOTE (@Techassi): This is really sad and bothers me more than I would like to admit. This
// clone could be removed by using the code directly in the calling function. But that would
// remove the possibility to easily test the parsing.
Ok(shortest_entry.to_owned())
}

#[cfg(test)]
mod tests {
use std::path::PathBuf;

use super::*;
use rstest::rstest;

#[test]
fn use_different_kubernetes_cluster_domain_value() {
let cluster_domain = "my-cluster.local".to_string();

// set different domain via env var
unsafe {
env::set_var(KUBERNETES_CLUSTER_DOMAIN_ENV, &cluster_domain);
}

// initialize the lock
let _ = KUBERNETES_CLUSTER_DOMAIN.set(retrieve_cluster_domain().unwrap());

assert_eq!(
cluster_domain,
KUBERNETES_CLUSTER_DOMAIN.get().unwrap().to_string()
);
}

#[rstest]
fn parse_resolv_conf_pass(
#[files("fixtures/cluster_domain/pass/*.resolv.conf")] path: PathBuf,
) {
assert_eq!(
retrieve_cluster_domain_from_resolv_conf(path).unwrap(),
KUBERNETES_CLUSTER_DOMAIN_DEFAULT
);
}

#[rstest]
fn parse_resolv_conf_fail(
#[files("fixtures/cluster_domain/fail/*.resolv.conf")] path: PathBuf,
) {
assert!(retrieve_cluster_domain_from_resolv_conf(path).is_err());
}
}
1 change: 1 addition & 0 deletions crates/stackable-operator/src/utils/mod.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
pub mod bash;
pub mod cluster_domain;
pub mod crds;
pub mod logging;
mod option;
Expand Down
Loading