-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPM Bee 微调时设置 half 出现 CUDA 报错,不设置 half 则 assert 报错 #79
Comments
CPM 微调脚本训练,不开启 --use-delta, 并且设置配置文件中 half 为 false,则出现如下错误: 并且无法 /tmp 目录下找到该文件。 |
这个是否是由于开启 cpu offload 导致的, bmtrain 是否有相关开关控制呢? |
您好,这是由于此前loss_func算子仅支持半精度,现已修复 |
CPM 使用微调脚本训练, 不开启 --use-delta 这一选项,则出现如下错误:
Traceback (most recent call last):
File "finetune_cpm_bee.py", line 503, in
main()
File "finetune_cpm_bee.py", line 499, in main
finetune(args, tokenizer, model, optimizer, lr_scheduler, optim_manager)
File "finetune_cpm_bee.py", line 364, in finetune
optim_manager.step()
File "/ms_test2/miniconda3/envs/ms1.11/lib/python3.7/site-packages/bmtrain/optim/optim_manager.py", line 131, in step
optimizer.step(scale=self.loss_scale)
File "/ms_test2/miniconda3/envs/ms1.11/lib/python3.7/site-packages/torch/optim/optimizer.py", line 109, in wrapper
return func(*args, **kwargs)
File "/ms_test2/miniconda3/envs/ms1.11/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/ms_test2/miniconda3/envs/ms1.11/lib/python3.7/site-packages/bmtrain/optim/adam_offload.py", line 77, in step
state["_grad_fp16"] = torch.empty(p.size(), dtype=torch.float16, pin_memory=True) # on host
RuntimeError: CUDA error: OS call failed or operation not supported on this OS
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
The text was updated successfully, but these errors were encountered: