Skip to content

Update compute resources to account for MCAD and InstaScale #305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/resources-olm-upgrade/subscription.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,11 @@ spec:
name: codeflare-operator
source: codeflare-olm-test
sourceNamespace: olm
config:
resources:
limits:
cpu: 400m
memory: 128Mi
requests:
cpu: 50m
memory: 64Mi
2 changes: 1 addition & 1 deletion .github/workflows/e2e_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:
echo Deploying CodeFlare operator
IMG="${REGISTRY_ADDRESS}"/codeflare-operator
make image-push -e IMG="${IMG}"
make deploy -e IMG="${IMG}"
make deploy -e IMG="${IMG}" -e ENV="e2e"
kubectl wait --timeout=120s --for=condition=Available=true deployment -n openshift-operators codeflare-operator-manager

echo Setting up CodeFlare stack
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/olm_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
runs-on: ubuntu-20.04
timeout-minutes: 60
env:
OLM_VERSION: v0.24.0
OLM_VERSION: v0.25.0
VERSION: "v0.0.0-ghaction" # Need to supply some semver version for bundle to be properly generated
CATALOG_BASE_IMG: "registry.access.redhat.com/redhat/community-operator-index:v4.13"
CODEFLARE_TEST_TIMEOUT_SHORT: "1m"
Expand Down
8 changes: 6 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ IMG ?= ${IMAGE_TAG_BASE}:${VERSION}
# ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary.
ENVTEST_K8S_VERSION = 1.24.2

# The target deployment environment, that corresponds to the Kustomize directory
# used to build the manifests.
ENV ?= default

# Get the currently used golang install path (in GOPATH/bin, unless GOBIN is set)
ifeq (,$(shell go env GOBIN))
GOBIN=$(shell go env GOPATH)/bin
Expand Down Expand Up @@ -202,13 +206,13 @@ uninstall: manifests kustomize ## Uninstall CRDs from the K8s cluster specified
deploy: manifests kustomize ## Deploy controller to the K8s cluster specified in ~/.kube/config.
$(SED) -i -E "s|(- )\${MCAD_REPO}.*|\1\${MCAD_CRD}|" config/crd/mcad/kustomization.yaml
cd config/manager && $(KUSTOMIZE) edit set image controller=${IMG}
$(KUSTOMIZE) build config/default | kubectl apply -f -
$(KUSTOMIZE) build config/${ENV} | kubectl apply -f -
git restore config/*

.PHONY: undeploy
undeploy: ## Undeploy controller from the K8s cluster specified in ~/.kube/config. Call with ignore-not-found=true to ignore resource not found errors during deletion.
$(SED) -i -E "s|(- )\${MCAD_REPO}.*|\1\${MCAD_CRD}|" config/crd/mcad/kustomization.yaml
$(KUSTOMIZE) build config/default | kubectl delete --ignore-not-found=$(ignore-not-found) -f -
$(KUSTOMIZE) build config/${ENV} | kubectl delete --ignore-not-found=$(ignore-not-found) -f -
git restore config/*

##@ Build Dependencies
Expand Down
9 changes: 9 additions & 0 deletions config/e2e/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
bases:
- ../default

patches:
- target:
kind: Deployment
name: manager
namespace: system
path: patch_resources.yaml
2 changes: 2 additions & 0 deletions config/e2e/patch_resources.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- op: remove
path: /spec/template/spec/containers/0/resources
8 changes: 4 additions & 4 deletions config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,10 @@ spec:
periodSeconds: 10
resources:
limits:
cpu: 500m
memory: 128Mi
cpu: "1"
memory: 1Gi
requests:
cpu: 10m
memory: 64Mi
cpu: "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 CPU is quite high for two controllers, is it really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've been reluctant, given our current CI environment is short on CPU, but MCAD has not the reputation to be lightweight, and its tests assume 2 CPUs. It may explain why no CPU requirements were specified for MCAD in the previous operator design.

I also think it may not be a good practice to drive these requirements by the limitation of our current CI environment. So they may have to be configured, depending on the environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking whether it may have sense to reduce the request value.
This helps with resource usage for not intensive cases, while keeping the limit high enough. On the other side it can affect pod eviction order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to have MCAD configured with the guaranteed QoS class, and with enough resources, so it performs acceptably by default.

With that in mind, I've added the extra test configuration so it can still run within the limited resources of GH Actions runners.

memory: 1Gi
serviceAccountName: controller-manager
terminationGracePeriodSeconds: 10