You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/InternLM/hf_test.py", line 15, in <module>
output = model.generate(**inputs, **gen_kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1592, in generate
return self.sample(
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2734, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
Describe the bug
我来重新描述一下我的问题,我在用internevo训练的时候用的bf16,然后转换成hf后用fp16推理遇到了下述报错
这个错误是由于model load进来的时候torch_dtype的,如果我改成
torch_type=torch.bfloat16
或者torch.float32
都是没有问题的,但是torch.float16
会存在这个问题,我自己的理解是训练用bf16,推理用fp16本身就存在一定的精度误差,指数位bf16是高于fp16的,最后比如计算attention的matrix multiply时会导致这个错误,但是我看到internlm官方的代码也是用torch.float16,所以想请教下这个问题Environment
官方镜像
Other information
No response
The text was updated successfully, but these errors were encountered: