CUDA Device does not support bfloat16 #15

xlzhou01 · 2024-10-18T10:55:34Z

File "/home/.conda/envs/spmba/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 234, in init
raise RuntimeError('Current CUDA Device does not support bfloat16. Please switch dtype to float16.')
RuntimeError: Current CUDA Device does not support bfloat16. Please switch dtype to float16.

JusperLee · 2024-10-18T14:42:35Z

It looks like you're encountering a compatibility issue with bfloat16 on your current CUDA device. As the error suggests, switching the data type to float16 should resolve this problem. If you're using Mamba, make sure your environment is set up to support the necessary CUDA features. If you have further questions or need assistance, feel free to reach out!

xlzhou01 · 2024-10-19T02:01:10Z

I set it to float16 （precision="16-mixed"）, and then the following error occurred:

Monitored metric val_loss/dataloader_idx_0 = nan is not finite. Previous best value was inf. Signaling Trainer to stop.
Epoch 0, global step 13900: 'val_loss/dataloader_idx_0' reached inf (best inf), saving model to '/data/SPMamba/Experiments/checkpoint/SPMamba-Libri2Mix/epoch=0.ckpt' as top 5

I use the noisy Libri2mix sub-dataset.
And I try again:

I also tried running the clean sub-dataset of Libri2Mix later, and I encountered the same issue. Could it be related to the modified precision?

JusperLee · 2024-10-22T11:58:18Z

You might need to adjust the value of 'eps' used in the paper to match the precision you are working with. When using float16 (precision='16-mixed'), the limited numerical range can sometimes lead to instability, such as NaNs or Infs in the loss. Consider increasing 'eps' slightly to maintain numerical stability during training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Device does not support bfloat16 #15

CUDA Device does not support bfloat16 #15

xlzhou01 commented Oct 18, 2024

JusperLee commented Oct 18, 2024

xlzhou01 commented Oct 19, 2024 •

edited

Loading

JusperLee commented Oct 22, 2024

CUDA Device does not support bfloat16 #15

CUDA Device does not support bfloat16 #15

Comments

xlzhou01 commented Oct 18, 2024

JusperLee commented Oct 18, 2024

xlzhou01 commented Oct 19, 2024 • edited Loading

JusperLee commented Oct 22, 2024

xlzhou01 commented Oct 19, 2024 •

edited

Loading