Skip to content

Quota release e2e testing and Fits function fix #398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

metalcycling
Copy link
Collaborator

This PR includes testing for the quota release using Kuttl. The tests revealed a major problem with the Fits function in qm_lib_backend_with_quotasubt_mgr.go which prevents allocating quota to a consumer that has already been added. This is not the right behavior as a consumer can be added to the quota tree while it waits for quota to be available. The previous code prevented existing consumers from getting quota due to the fact that the already exists in the tree.

@metalcycling metalcycling requested review from asm582 and z103cb May 29, 2023 16:19
@@ -523,7 +523,7 @@ func (qm *QuotaManager) Fits(aw *arbv1.AppWrapper, awResDemands *clusterstateapi

_, err = qm.quotaManagerBackend.AddConsumer(consumerInfo)
if err != nil {
return false, nil, err.Error()
klog.Errorf("[Fits] Consumer '%s' already exists.", consumerInfo.GetID())
Copy link
Contributor

@z103cb z103cb May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@metalcycling I think this change is not necessary, and it decreases the ease of code comprehension.

It is not necessary because the call allocResponse, err := qm.quotaManagerBackend.AllocateForest(QuotaManagerForestName, consumerID) will have the same effect as proposed change implementation (see if forestController.IsConsumerAllocated(consumerID) {)

This change is not helping the code comprehension as:

  • logging an error and continuing is a bad practice.
  • if you do need to handle the error, it should probably be logged using the warning level.

Overall this change should be reverted as:

  • it only makes code quality worse
  • it doesn't fix the underlying issue (allocating consumers when they are already allocated)

The proper fix for the issue highlighted by the test-case would be to replace the code at line 530

	klog.V(4).Infof("[Fits] Sending quota allocation request: %#v ", consumerInfo)
	allocResponse, err := qm.quotaManagerBackend.AllocateForest(QuotaManagerForestName, consumerID)
	if err != nil {
		qm.removeConsumer(consumerID)
		klog.Errorf("[Fits] Error allocating consumer: %s/%s, err=%#v.", aw.Namespace, aw.Name, err)
		return false, nil, err.Error()
	}
	klog.V(4).Infof("[Fits] allocation response received. Quota Allocated: %t, Message: '%s', Preempted app wrappers count: %d", allocResponse.IsAllocated(),
		strings.TrimSpace(allocResponse.GetMessage()), len(allocResponse.GetPreemptedIds()))
	doesFit := allocResponse.IsAllocated()
	if !doesFit {
		qm.removeConsumer(consumerID)
		return doesFit, preemptIds, strings.TrimSpace(allocResponse.GetMessage())
	}
	preemptIds = qm.getAppWrappers(allocResponse.GetPreemptedIds())
	return doesFit, preemptIds, strings.TrimSpace(allocResponse.GetMessage())

The removeConsumer helper function code:

func (qm *QuotaManager) removeConsumer(consumerID string) {
	// removing the consumer to allow for the consumer to be added if and when
	// the function is called for the same app wrapper
	removed, err := qm.quotaManagerBackend.RemoveConsumer(consumerID)
	if err != nil {
		klog.Warningf("Failed to remove consumer %s, %#v", consumerID, err)
	}
	if !removed {
		klog.Warningf("Failed to remove consumer %s", consumerID)
	}
}

Copy link
Contributor

@z103cb z103cb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the Fits function are needed.

apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- command: kubectl -n test delete appwrappers,jobs --all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, you should probably add a comment as to why are you deleting all the app wrappers that have previously been installed.

---
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think that it would be safer to increase this value to at least 180, or remove it all together.

z103cb added a commit to z103cb/multi-cluster-app-dispatcher that referenced this pull request May 30, 2023
metalcycling pushed a commit that referenced this pull request Jun 1, 2023
* Added e2e tests to emulate the condition.
Minor fixes in the e2e script.

* Incorporated feedback and tests from #398

* Added unit tests to emulate a long running consumer
@metalcycling
Copy link
Collaborator Author

#397 implemented a better fix and added the test case in this PR, so no need for it anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants