You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for the amazing project! We're currently working on fine-tuning the DeepSeek model and followed the instructions in your README. However, after transforming the weights, we encountered the following error message:
RuntimeError: Error(s) in loading state_dict for GPTModel:
size mismatch for embedding.word_embeddings.weight: copying a param with shape torch.Size([102400, 2048]) from checkpoint, the shape in current model is torch.Size([102416, 2048]).
size mismatch for output_layer.weight: copying a param with shape torch.Size([102400, 2048]) from checkpoint, the shape in current model is torch.Size([102416, 2048]).
INFO:megatron.core.optimizer:Setting up optimizer with OptimizerConfig(optimizer='adam', lr=1e-05, min_lr=1e-06, decoupled_lr=None, decoupled_min_lr=None, weight_decay=0.1, fp16=False, bf16=True, pa
rams_dtype=torch.bfloat16, loss_scale=None, initial_loss_scale=4294967296, min_loss_scale=1.0, loss_scale_window=1000, hysteresis=2, adam_beta1=0.9, adam_beta2=0.95, adam_eps=1e-08, sgd_momentum=0.9
, use_distributed_optimizer=True, overlap_grad_reduce=False, overlap_param_gather=False, clip_grad=1.0, log_num_zeros_in_grad=False, barrier_with_L1_time=True, timers=<megatron.core.timers.Timers ob
ject at 0x7f1ca43bfe80>)
> learning rate decay style: cosine
loading release checkpoint from /mnt/deepseek-ckpts/DeepSeek-Coder-V2-Lite-Instruct-to-mcore-tp1-pp1-ep4
Traceback (most recent call last):
File "/mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/examples/deepseek_v2/pretrain_deepseek.py", line 222, in <module>
pretrain(train_valid_test_datasets_provider,
File "/mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/Megatron-LM-240405/megatron/training/training.py", line 236, in pretrain
model, optimizer, opt_param_scheduler = setup_model_and_optimizer(
File "/mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/Megatron-LM-240405/megatron/training/training.py", line 518, in setup_model_and_optimizer
args.iteration, args.num_floating_point_operations_so_far = load_checkpoint(
File "/mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/Megatron-LM-240405/megatron/training/checkpointing.py", line 718, in load_checkpoint
model[0].load_state_dict(state_dict['model'], strict=strict)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPTModel:
size mismatch for embedding.word_embeddings.weight: copying a param with shape torch.Size([102400, 2048]) from checkpoint, the shape in current model is torch.Size([102416, 2048]).
size mismatch for output_layer.weight: copying a param with shape torch.Size([102400, 2048]) from checkpoint, the shape in current model is torch.Size([102416, 2048]).
The text was updated successfully, but these errors were encountered:
Thank you for the amazing project! We're currently working on fine-tuning the DeepSeek model and followed the instructions in your README. However, after transforming the weights, we encountered the following error message:
Command
Full error log
The text was updated successfully, but these errors were encountered: