Skip to content

Quota release e2e testing and Fits function fix #398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -523,7 +523,7 @@ func (qm *QuotaManager) Fits(aw *arbv1.AppWrapper, awResDemands *clusterstateapi

_, err = qm.quotaManagerBackend.AddConsumer(consumerInfo)
if err != nil {
return false, nil, err.Error()
klog.Errorf("[Fits] Consumer '%s' already exists.", consumerInfo.GetID())
Copy link
Contributor

@z103cb z103cb May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metalcycling I think this change is not necessary, and it decreases the ease of code comprehension.

It is not necessary because the call allocResponse, err := qm.quotaManagerBackend.AllocateForest(QuotaManagerForestName, consumerID) will have the same effect as proposed change implementation (see if forestController.IsConsumerAllocated(consumerID) {)

This change is not helping the code comprehension as:

  • logging an error and continuing is a bad practice.
  • if you do need to handle the error, it should probably be logged using the warning level.

Overall this change should be reverted as:

  • it only makes code quality worse
  • it doesn't fix the underlying issue (allocating consumers when they are already allocated)

The proper fix for the issue highlighted by the test-case would be to replace the code at line 530

	klog.V(4).Infof("[Fits] Sending quota allocation request: %#v ", consumerInfo)
	allocResponse, err := qm.quotaManagerBackend.AllocateForest(QuotaManagerForestName, consumerID)
	if err != nil {
		qm.removeConsumer(consumerID)
		klog.Errorf("[Fits] Error allocating consumer: %s/%s, err=%#v.", aw.Namespace, aw.Name, err)
		return false, nil, err.Error()
	}
	klog.V(4).Infof("[Fits] allocation response received. Quota Allocated: %t, Message: '%s', Preempted app wrappers count: %d", allocResponse.IsAllocated(),
		strings.TrimSpace(allocResponse.GetMessage()), len(allocResponse.GetPreemptedIds()))
	doesFit := allocResponse.IsAllocated()
	if !doesFit {
		qm.removeConsumer(consumerID)
		return doesFit, preemptIds, strings.TrimSpace(allocResponse.GetMessage())
	}
	preemptIds = qm.getAppWrappers(allocResponse.GetPreemptedIds())
	return doesFit, preemptIds, strings.TrimSpace(allocResponse.GetMessage())

The removeConsumer helper function code:

func (qm *QuotaManager) removeConsumer(consumerID string) {
	// removing the consumer to allow for the consumer to be added if and when
	// the function is called for the same app wrapper
	removed, err := qm.quotaManagerBackend.RemoveConsumer(consumerID)
	if err != nil {
		klog.Warningf("Failed to remove consumer %s, %#v", consumerID, err)
	}
	if !removed {
		klog.Warningf("Failed to remove consumer %s", consumerID)
	}
}

}

consumerID := consumerInfo.GetID()
Expand Down
21 changes: 21 additions & 0 deletions test/e2e-kuttl/quota-forest/09-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Verify AppWrappers finished successfully
---
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think that it would be safer to increase this value to at least 180, or remove it all together.

---
apiVersion: mcad.ibm.com/v1beta1
kind: AppWrapper
metadata:
name: my-job-1
namespace: test
status:
state: Completed
---
apiVersion: mcad.ibm.com/v1beta1
kind: AppWrapper
metadata:
name: my-job-2
namespace: test
status:
state: Completed
133 changes: 133 additions & 0 deletions test/e2e-kuttl/quota-forest/09-install.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- command: kubectl -n test delete appwrappers,jobs --all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, you should probably add a comment as to why are you deleting all the app wrappers that have previously been installed.

---
apiVersion: mcad.ibm.com/v1beta1
kind: AppWrapper
metadata:
name: my-job-1
namespace: test
labels:
quota_context: bronze
quota_service: service-root
spec:
schedulingSpec:
minAvailable: 1
resources:
GenericItems:
- replicas: 1
completionstatus: Complete
custompodresources:
- replicas: 1
requests:
cpu: 900m
nvidia.com/gpu: 0
memory: 300Mi
limits:
cpu: 900m
nvidia.com/gpu: 0
memory: 300Mi
generictemplate:
apiVersion: batch/v1
kind: Job
metadata:
name: my-job-1
namespace: test
labels:
appwrapper.mcad.ibm.com: my-job-1
spec:
parallelism: 1
completions: 1
template:
metadata:
name: my-job-1
namespace: test
labels:
appwrapper.mcad.ibm.com: my-job-1
spec:
terminationGracePeriodSeconds: 1
restartPolicy: Never
containers:
- name: ubuntu
image: ubuntu:latest
imagePullPolicy: IfNotPresent
command:
- sh
- -c
- |
sleep 30
resources:
requests:
cpu: 900m
nvidia.com/gpu: 0
memory: 300Mi
limits:
cpu: 900m
nvidia.com/gpu: 0
memory: 300Mi
---
apiVersion: mcad.ibm.com/v1beta1
kind: AppWrapper
metadata:
name: my-job-2
namespace: test
labels:
quota_context: bronze
quota_service: service-root
spec:
schedulingSpec:
minAvailable: 1
resources:
GenericItems:
- replicas: 1
completionstatus: Complete
custompodresources:
- replicas: 1
requests:
cpu: 900m
nvidia.com/gpu: 0
memory: 300Mi
limits:
cpu: 900m
nvidia.com/gpu: 0
memory: 300Mi
generictemplate:
apiVersion: batch/v1
kind: Job
metadata:
name: my-job-2
namespace: test
labels:
appwrapper.mcad.ibm.com: my-job-2
spec:
parallelism: 1
completions: 1
template:
metadata:
name: my-job-2
namespace: test
labels:
appwrapper.mcad.ibm.com: my-job-2
spec:
terminationGracePeriodSeconds: 1
restartPolicy: Never
containers:
- name: ubuntu
image: ubuntu:latest
imagePullPolicy: IfNotPresent
command:
- sh
- -c
- |
sleep 30
resources:
requests:
cpu: 900m
nvidia.com/gpu: 0
memory: 300Mi
limits:
cpu: 900m
nvidia.com/gpu: 0
memory: 300Mi