Skip to content

Failure to create RayCluster with an imagePullSecret #649

Closed
@dgrove-oss

Description

@dgrove-oss

Describe the Bug

Doing an oc apply of the attached yaml for a RayCluster
broken-raycluster.yaml.txt
where the head pod has a user-provided imagePullSecret results in an endless reconciliation loop with the head pod being rapidly deleted and recreated. It looks to me like this is due to the logic added in #601 is not acting as intended.

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

Running RHOAI 2.16 on OpenShift 4.14.

Steps to Reproduce the Bug

  1. oc apply the yaml for the RayCluster
  2. observe an endless cascade of head pods being rapidly deleted

What Have You Already Tried to Debug the Issue?

Have recreated the problem multiple times. Happens reliably.

Have verified that removing the imagePullSecret from the head pod specification results in the RayCluster being created successful as expected.

Expected Behavior

Users should be able to provide imagePullSecrets for the head node of a RayCluster to enable the use of private registries.

Screenshots, Console Output, Logs, etc.

I've attached the relevant log snippets from the codeflare-operator codeflare-log.txt and kuberay-operator
kuberay.txt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions