Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSeek Vocab-size Mismatch #338

Open
Jiayi-Pan opened this issue Sep 10, 2024 · 1 comment
Open

DeepSeek Vocab-size Mismatch #338

Jiayi-Pan opened this issue Sep 10, 2024 · 1 comment

Comments

@Jiayi-Pan
Copy link

Thank you for the amazing project! We're currently working on fine-tuning the DeepSeek model and followed the instructions in your README. However, after transforming the weights, we encountered the following error message:

RuntimeError: Error(s) in loading state_dict for GPTModel:
        size mismatch for embedding.word_embeddings.weight: copying a param with shape torch.Size([102400, 2048]) from checkpoint, the shape in current model is torch.Size([102416, 2048]).
        size mismatch for output_layer.weight: copying a param with shape torch.Size([102400, 2048]) from checkpoint, the shape in current model is torch.Size([102416, 2048]).

Command

cd /mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/examples/deepseek_v2
sh run_finetune_deepseek.sh  \
dsw \
A2.4B \
1    \
8    \
1e-5   \
1e-6   \
128  \
128  \
bf16  \
1   \
1  \
4 \
sel \
true \
true \
true \
100  \
/mnt/deepseek-datasets/alpaca_zh-train.json   \
/mnt/deepseek-datasets/alpaca_zh-valid.json   \
/mnt/deepseek-ckpts/DeepSeek-Coder-V2-Lite-Instruct-to-mcore-tp1-pp1-ep4 \
100000   \
10000   \
/mnt/deepseek-ckpts/test_ft

Full error log

INFO:megatron.core.optimizer:Setting up optimizer with OptimizerConfig(optimizer='adam', lr=1e-05, min_lr=1e-06, decoupled_lr=None, decoupled_min_lr=None, weight_decay=0.1, fp16=False, bf16=True, pa
rams_dtype=torch.bfloat16, loss_scale=None, initial_loss_scale=4294967296, min_loss_scale=1.0, loss_scale_window=1000, hysteresis=2, adam_beta1=0.9, adam_beta2=0.95, adam_eps=1e-08, sgd_momentum=0.9
, use_distributed_optimizer=True, overlap_grad_reduce=False, overlap_param_gather=False, clip_grad=1.0, log_num_zeros_in_grad=False, barrier_with_L1_time=True, timers=<megatron.core.timers.Timers ob
ject at 0x7f1ca43bfe80>)
> learning rate decay style: cosine
 loading release checkpoint from /mnt/deepseek-ckpts/DeepSeek-Coder-V2-Lite-Instruct-to-mcore-tp1-pp1-ep4
Traceback (most recent call last):
  File "/mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/examples/deepseek_v2/pretrain_deepseek.py", line 222, in <module>
    pretrain(train_valid_test_datasets_provider,
  File "/mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/Megatron-LM-240405/megatron/training/training.py", line 236, in pretrain
    model, optimizer, opt_param_scheduler = setup_model_and_optimizer(
  File "/mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/Megatron-LM-240405/megatron/training/training.py", line 518, in setup_model_and_optimizer
    args.iteration, args.num_floating_point_operations_so_far = load_checkpoint(
  File "/mnt/task_wrapper/user_output/artifacts/Pai-Megatron-Patch/Megatron-LM-240405/megatron/training/checkpointing.py", line 718, in load_checkpoint
    model[0].load_state_dict(state_dict['model'], strict=strict)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GPTModel:
        size mismatch for embedding.word_embeddings.weight: copying a param with shape torch.Size([102400, 2048]) from checkpoint, the shape in current model is torch.Size([102416, 2048]).
        size mismatch for output_layer.weight: copying a param with shape torch.Size([102400, 2048]) from checkpoint, the shape in current model is torch.Size([102416, 2048]).
@jerryli1981
Copy link
Collaborator

您好,感谢哈。我们刚刚对DeepSeek-V2进行了一次关键升级,看看是否还有问题,如果还存在的话可以在新版上重新提个PR谢谢:#355

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants