-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
前向传播返回值缺少bal_loss #209
Comments
在对pretrain_gpt代码进行修改之后(应用patch.py:from fmoe.megatron.patch import patch_loss_func_v2_5, patch_forward_step),用单gpu训练时显示cuda out of memory,请问这种情况有对应的解决办法吗 [before the start of training step] datetime: 2024-09-08 19:33:33 |
oom 说明模型或者中间结果太大了. 建议换个小点的模型. |
在应用完补丁执行pretrain_gpt.py遇到的问题
Traceback (most recent call last):
File "pretrain_gpt.py", line 126, in
pretrain(train_valid_test_datasets_provider, model_provider, forward_step,
File "/workspace/Megatron-LM/megatron/training.py", line 157, in pretrain
iteration = train(forward_step_func,
File "/workspace/Megatron-LM/megatron/training.py", line 630, in train
train_step(forward_step_func,
File "/workspace/Megatron-LM/megatron/training.py", line 377, in train_step
losses_reduced = forward_backward_func(
File "/workspace/Megatron-LM/megatron/schedules.py", line 132, in forward_backward_no_pipelining
output_tensor, bal_loss = forward_step(forward_step_func, data_iterator, model,
File "/workspace/Megatron-LM/megatron/schedules.py", line 61, in forward_step
output_tensor, loss_func, bal_loss = forward_step_func(data_iterator, model)
ValueError: not enough values to unpack (expected 3, got 2)
pretrain_gpt源码:
def forward_step(data_iterator, model):
"""Forward step."""
args = get_args()
timers = get_timers()
The text was updated successfully, but these errors were encountered: