Skip to content

CA DRA: correctly handle Node readiness after scale-up #7780

@towca

Description

@towca

Which component are you using?:

/area cluster-autoscaler
/area core-autoscaler
/wg device-management

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

Nodes with custom resources exposed by device plugins (e.g. GPUs) have condition Ready before they actually expose the resources. Cluster Autoscaler has to hack them to be not-Ready until they do expose the resources, otherwise the unschedulable pods don't pack on the Nodes in filter_out_schedulable and CA does another, unnecessary scale-up.

The same happens for DRA resources - until the driver for a given Node publishes its ResourceSlices, the Node is considered Ready but the Pod can't schedule on it, so CA does another scale-up.

Describe the solution you'd like.:

We could re-do the current GPU hack and treat Nodes that should have ResourceSlices exposed but don't as not Ready. We can detect whether a given Node should have ResourceSlices ready by comparing with the template node for its node group.

Alternatively, maybe we could add a new Condition to the Node, specifying whether ResourceSlices have been exposed already? Then CA could just look at the condition instead of correlating with the template node. This seems like a much cleaner solution, but it requires changes in core Kubernetes objects, so not sure how feasible it is.

Additional context.:

This is a part of Dynamic Resource Allocation (DRA) support in Cluster Autoscaler. An MVP of the support was implemented in #7530 (with the whole implementation tracked in kubernetes/kubernetes#118612). There are a number of post-MVP follow-ups to be addressed before DRA autoscaling is ready for production use - this is one of them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/cluster-autoscalerarea/core-autoscalerDenotes an issue that is related to the core autoscaler and is not specific to any provider.wg/device-managementCategorizes an issue or PR as relevant to WG Device Management.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions