Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"nvidia-gpu-operator" does not exist #1063

Closed
kabelo-twala opened this issue Aug 14, 2024 · 0 comments · Fixed by #1081
Closed

"nvidia-gpu-operator" does not exist #1063

kabelo-twala opened this issue Aug 14, 2024 · 0 comments · Fixed by #1081
Labels
bug Something isn't working

Comments

@kabelo-twala
Copy link

Describe the bug

Hi, I was going through the steps laid out on comfyui-on-eks/. This package is one of the dependencies and I got an error on AWS cloudformation reading

Received response status [FAILED] from custom resource. Message returned: Error: b'Release "nvidia-gpu-operator" does not exist. Installing it now.\nError: failed to fetch https://helm.ngc.nvidia.com/nvidia/charts/gpu-operator-v24.3.0.tgz : 401 Unauthorized\n' Logs: /aws/lambda/Comfyui-Cluster-awscdkawseksKubect-Handler886CB40B-Pn5FFiLl2QgE at invokeUserFunction

Expected Behavior

All cloudformation events should complete successfully

Current Behavior

Rollback error

Screenshot 2024-08-14 at 12 17 26

Reproduction Steps

Running cdk deploy Comfyui-Cluster after running the step above from here

Possible Solution

making the flowing change to repository in /aws-quickstart/cdk-eks-blueprints/blob/main/lib/addons/gpu-operator/index.ts

from

const defaultProps = {
    name: "gpu-operator-addon",
    namespace: "gpu-operator",
    chart: "gpu-operator",
    version: "v24.3.0",
    release: "nvidia-gpu-operator",
    repository: "https://helm.ngc.nvidia.com/nvidia",
    createNamespace: true,
    values: {}
};

to

const defaultProps = {
    name: "gpu-operator-addon",
    namespace: "gpu-operator",
    chart: "gpu-operator",
    version: "v24.3.0",
    release: "nvidia-gpu-operator",
    repository: "https://nvidia.github.io/gpu-operator",
    createNamespace: true,
    values: {}
};

as suggested NVIDIA/gpu-operator#538 (comment)

Additional Information/Context

The script is ran from AWS Lambda I believe

CDK CLI Version

2.148.1

EKS Blueprints Version

1.15.1

Node.js Version

v22.2.0

Environment details (OS name and version, etc.)

Mac

Other information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant