You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Noticed that appwrappers are failing due to missing api ray.io apiGroup
Error message is the following when a cluster.up() is applied:
. Attempting to re-enqueque...
W0921 20:57:19.009638 1 queuejob_controller_ex.go:1936] [worker] Item re-enqueued.
I0921 20:57:20.797788 1 request.go:690] Waited for 1.777425005s due to client-side throttling, not priority and fairness, request: GET:[https://172.30.0.1:443/apis/kfdef.apps.kubeflow.org/v1](https://172.30.0.1/apis/kfdef.apps.kubeflow.org/v1)
E0921 20:57:22.741529 1 queuejob_controller_ex.go:2194] [Cleanup] Error deleting generic item raytest, from app wrapper='default/raytest' err=rayclusters.ray.io is forbidden: User "system:serviceaccount:openshift-operators:codeflare-operator-controller-manager" cannot list resource "rayclusters" in API group "ray.io" at the cluster scope.
E0921 20:57:26.513252 1 queuejob_controller_ex.go:1874] [worker] Failed to delete resources for AppWrapper Job 'default/raytest', err=1 error occurred:
* rayclusters.ray.io is forbidden: User "system:serviceaccount:openshift-operators:codeflare-operator-controller-manager" cannot list resource "rayclusters" in API group "ray.io" at the cluster scope
W0921 20:57:26.513314 1 queuejob_controller_ex.go:1932] [worker] Fail to process item from eventQueue, err 1 error occurred:
* rayclusters.ray.io is forbidden: User "system:serviceaccount:openshift-operators:codeflare-operator-controller-manager" cannot list resource "rayclusters" in API group "ray.io" at the cluster scope
My temporary fix was to edit the clusterrole:
oc edit clusterrole codeflare-operator.v1.0.0-rc.1-59f4fb8598
and add
- apiGroups:
- ray.io
resources:
- rayclusters
- rayjobs
- rayservices
verbs:
- create
- delete
- get
- list
- patch
Please specify the component versions in which you have encountered this bug.
Codeflare SDK: 0.7.1
MCAD: Built-in now to codeflare operator
Instascale:Built-in now to codeflare operator
Codeflare Operator: codeflare-operator.v1.0.0-rc.1
Other:
Steps to Reproduce the Bug
Deploy the current codeflare v1.0.0-rc.1 using the QuickStart guide on an OpenShift 4.12 cluster. It all comes up, but when you try to run a cluster.up() the Appwrapper starts but is unable to properly communicate to the Ray Operator.
What Have You Already Tried to Debug the Issue?
Described above
Expected Behavior
I expected the appwrapper to start the Ray head and worker nodes
Screenshots, Console Output, Logs, etc.
Add screenshots of UIs (like dashboards), etc. that help explain the issue.
Affected Releases
v1.0.0-rc.1 and main
Additional Context
Add as applicable and when known:
OS: 1) MacOS, 2) Linux, 3) Windows: [1 - 3]
OS Version: [e.g. RedHat Linux X.Y.Z, MacOS Monterey, ...]
Describe the Bug
Noticed that appwrappers are failing due to missing api ray.io apiGroup
Error message is the following when a cluster.up() is applied:
My temporary fix was to edit the clusterrole:
I suspect that the real fix should be in here:
https://github.com/project-codeflare/codeflare-operator/blob/main/config/rbac/role.yaml
Codeflare Stack Component Versions
Please specify the component versions in which you have encountered this bug.
Codeflare SDK: 0.7.1
MCAD: Built-in now to codeflare operator
Instascale:Built-in now to codeflare operator
Codeflare Operator: codeflare-operator.v1.0.0-rc.1
Other:
Steps to Reproduce the Bug
Deploy the current codeflare v1.0.0-rc.1 using the QuickStart guide on an OpenShift 4.12 cluster. It all comes up, but when you try to run a cluster.up() the Appwrapper starts but is unable to properly communicate to the Ray Operator.
What Have You Already Tried to Debug the Issue?
Described above
Expected Behavior
I expected the appwrapper to start the Ray head and worker nodes
Screenshots, Console Output, Logs, etc.
Add screenshots of UIs (like dashboards), etc. that help explain the issue.
Affected Releases
v1.0.0-rc.1 and main
Additional Context
Add as applicable and when known:
Add any other information you think might be useful here.
The text was updated successfully, but these errors were encountered: