Improve API pod scheduler when multiple node groups are used

#### Description

Given the following list of node groups from a cluster config:

```yaml
node_groups:
  - name: A
     instance_type: t3.medium
  - name: B
     instance_type: c5.xlarge
  - name: C
     instance_type: c5.4xlarge
```

Assume we have no deployed APIs and that we have a node from each node group. Since we know that the order of the node groups dictates their priority, we expect that the node groups will be populated based on their priority.

#### Situation

Except that there are 2 problems:
* The order, when there are enough nodes from each node group, only dictates the likelihood of a pod being scheduled onto a specific node group. And the more node groups there are, the less likely that becomes.
* Increasing the number of API replicas will only lead to filling the node groups evenly (again, for the live nodes only). We don't want that because this way, B and C node groups will only slightly get filled when instead the API replicas could have gotten scheduled onto A. This way, the cluster-autoscaler could have taken the extra nodes away and reduce the costs overall.

The above 2 problems need to be addressed. For that, we need to edit the existing k8s scheduler or create a new one for our workloads. More on that here https://kubernetes.io/docs/reference/scheduling/config/ and here https://kubernetes.io/docs/reference/scheduling/policies/.

This becomes serious when there's a major scale-down event in the cluster and then afterwards only a fraction of the API replicas are brought back. The likelihood of this happening on a production cluster is high to very high.

#### Update

This is also applicable to single node group clusters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve API pod scheduler when multiple node groups are used #1965

Description

Situation

Update

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve API pod scheduler when multiple node groups are used #1965

Description

Description

Situation

Update

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions