Skip to content

wait_ready() fails when called before MCAD status object is created  #226

Closed
@MichaelClifford

Description

@MichaelClifford

There is a case, when the cluster is out of capacity and MCAD fails to create a cluster, where wait_ready() will fail with:

TypeError: 'MissingModel' object is not callable
Failed to init Ray cluster, error 'MissingModel' object is not callable

This appears to be caused by line 469 in _map_to_app_wrapper() when no state yet exists.

def _map_to_app_wrapper(cluster) -> AppWrapper:
cluster_model = cluster.model
return AppWrapper(
name=cluster.name(),
status=AppWrapperStatus(cluster_model.status.state.lower()),
can_run=cluster_model.status.canrun,
job_state=cluster_model.status.queuejobstate,
)

We should updatecluster.status() to account for this instance and keep the cluster status "pending" until it is resolved, or times out.

We should also add some error handling to _map_to_app_wrapper() so that the reason for failure is clear and the correct value is passed up to cluster.status().

Orginal Request from Slack:
https://project-codeflare.slack.com/archives/C04PF8V5MB3/p1689336080924819

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions