-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer.grad is not None 请问这个错怎么解决 #34
Comments
我没有遇到过这个问题,方便截图看一下你的pipelayer的分布嘛 |
辛苦大佬 代码没有改动。报了一个warning: /torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None pipelayer没有改动过 [2024-09-10 09:18:35,522] [INFO] [module.py:396:_partition_layers] Partitioning pipeline stages with method uniform |
看起来没什么问题,要不检查下关键库的版本吧(torch、transformers、accelerate、deepspeed等) |
都是按照requirements重新创建的conda环境。
大佬,你当时跑用的哪个版本deepspeed |
deepspeed版本的话:deepspeed==0.13.5 |
deepspeed处理这部分有些问题,我注释了这两行, 然后在deepspeed里梯度传递的时候过滤掉requires_grad了就可以 @Coobiw |
执行双卡训练命令: python -m torch.distributed.run --nproc_per_node=2 train_pipeline.py --cfg-path lavis/projects/pp_qwen14b/train_pp.yaml --num-stages 2
File "/data/workspace/MPP-LLaVA/train_pipeline.py", line 228, in main
loss = engine.train_batch(data_iter=train_iter)
File "/data/hulei/miniconda3/envs/lib/python3.10/site-packages/deepspeed/runtime/pipe/engine.py", line 388, in train_batch
self._exec_schedule(sched)
File "/data/hulei/miniconda3/envs/lib/python3.10/site-packages/deepspeed/runtime/pipe/engine.py", line 1422, in _exec_schedule
self._exec_instr(**cmd.kwargs)
File "/data/hulei/miniconda3/envs/lib/python3.10/site-packages/deepspeed/runtime/pipe/engine.py", line 1102, in _exec_send_grads
assert buffer.grad is not None
The text was updated successfully, but these errors were encountered: