Open
Description
hello.
I'm currently using cortex 0.40.0.
I seldom request thousands of jobs to certain cortex api by mistake.
When I do like that, I can't use cortex cli well (the response time is so long, or just hanging) and I guess that cortex operator is overloaded because of me.
(the status of operator-controller-manager
pod is continuously goes to OOMKilled -> CrashLoopBackOff)
To resolve this issue, I attempted these so far but It didn't work well.
- delete thousands of AWS sqs queue
- delete all of enqueuer job and worker job created by mistake
- delete certain cortex api and re-deploy it
After all I just down the cluster and up (+ re-deploy all of api) to make cortex work well.
If this is happened, what should I do to restore cortex without down and up cluster?
I glad to your support. Thank you so much.