Closed
Description
There is a case, when the cluster is out of capacity and MCAD fails to create a cluster, where wait_ready()
will fail with:
TypeError: 'MissingModel' object is not callable
Failed to init Ray cluster, error 'MissingModel' object is not callable
This appears to be caused by line 469 in _map_to_app_wrapper()
when no state yet exists.
codeflare-sdk/src/codeflare_sdk/cluster/cluster.py
Lines 465 to 472 in 0d9b23c
We should updatecluster.status()
to account for this instance and keep the cluster status "pending" until it is resolved, or times out.
We should also add some error handling to _map_to_app_wrapper()
so that the reason for failure is clear and the correct value is passed up to cluster.status()
.
Orginal Request from Slack:
https://project-codeflare.slack.com/archives/C04PF8V5MB3/p1689336080924819
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done