Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not compatible with H100 #76

Open
takakoyama opened this issue Sep 17, 2024 · 1 comment
Open

not compatible with H100 #76

takakoyama opened this issue Sep 17, 2024 · 1 comment

Comments

@takakoyama
Copy link

Hello,

I have been trying to install Hyena-DNA on my H100.
It appears that CUDA 11.7 is not compatible with H100.
I have installed '1.13.0+cu117' as recommended.

I get an error when I tried to train:
python -m train wandb=null experiment=hg38/genomic_benchmark_scratch model.fused_dropout_add_ln=False

Here is the error message:
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: unique_by_key: failed on 2nd step: cudaErrorNoKernelImageForDevice: no kernel image is available for execution on the device

This cudda 11.7 does not seem to be compatible with H100.
I get a message just trying to execute
import torch
torch.tensor([0.1, 0.2]).cuda()

Here is the error message:
NVIDIA H100 80GB HBM3 with CUDA capability sm_90 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75 sm_80 sm_86.
If you want to use the NVIDIA H100 80GB HBM3 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

Is it possible to run Hyena-DNA on H100? If so, what combination of cuda and pytroch work?

Thanks in Advance,

Taka Koyama

@takakoyama
Copy link
Author

I also tried the container but again H100 is not yet supported by the container.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: Detected NVIDIA NVIDIA H100 80GB HBM3 GPU, which is not yet supported in this version of the container
ERROR: No supported GPU(s) detected to run this container

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant