Skip to content

Improve API pod scheduler when multiple node groups are used #1965

Open
@RobertLucian

Description

@RobertLucian

Description

Given the following list of node groups from a cluster config:

node_groups:
  - name: A
     instance_type: t3.medium
  - name: B
     instance_type: c5.xlarge
  - name: C
     instance_type: c5.4xlarge

Assume we have no deployed APIs and that we have a node from each node group. Since we know that the order of the node groups dictates their priority, we expect that the node groups will be populated based on their priority.

Situation

Except that there are 2 problems:

  • The order, when there are enough nodes from each node group, only dictates the likelihood of a pod being scheduled onto a specific node group. And the more node groups there are, the less likely that becomes.
  • Increasing the number of API replicas will only lead to filling the node groups evenly (again, for the live nodes only). We don't want that because this way, B and C node groups will only slightly get filled when instead the API replicas could have gotten scheduled onto A. This way, the cluster-autoscaler could have taken the extra nodes away and reduce the costs overall.

The above 2 problems need to be addressed. For that, we need to edit the existing k8s scheduler or create a new one for our workloads. More on that here https://kubernetes.io/docs/reference/scheduling/config/ and here https://kubernetes.io/docs/reference/scheduling/policies/.

This becomes serious when there's a major scale-down event in the cluster and then afterwards only a fraction of the API replicas are brought back. The likelihood of this happening on a production cluster is high to very high.

Update

This is also applicable to single node group clusters.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformanceA performance improvementresearchDetermine technical constraints

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions