Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

合并权重问题 #34

Open
red-tie opened this issue Oct 7, 2023 · 6 comments
Open

合并权重问题 #34

red-tie opened this issue Oct 7, 2023 · 6 comments

Comments

@red-tie
Copy link

red-tie commented Oct 7, 2023

在合并7b-hf与奖励模型,sft模型,policy 模型的时候都报错。
Naive integrity check failed. This could imply that some of the checkpoint files are corrupted.

每个模型我都下载两遍,模型本身应该没问题,应该是哪出了问题。

@overal-well-off-society
Copy link

(moss) dk@user-SYS-4029GP-TRT2:~/Downloads/MOSS-RLHF-main$ python merge_weight_en.py recover --path_raw /data/dk_downloads/llama-7b-hf --path_diff /data/dk_downloads/sft_model/diff --path_tuned ./models/moss-rlhf-sft-model-7B-en/recover --model_type sft
[2023-10-07 21:07:27,031] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████| 33/33 [00:11<00:00, 2.88it/s]
0%| | 0/291 [00:00<?, ?it/s]model.embed_tokens.weight
model.layers.0.self_attn.q_proj.weight
model.layers.0.self_attn.k_proj.weight
model.layers.0.self_attn.v_proj.weight
model.layers.0.self_attn.o_proj.weight
model.layers.0.mlp.gate_proj.weight
model.layers.0.mlp.up_proj.weight
model.layers.0.mlp.down_proj.weight
model.layers.0.input_layernorm.weight
model.layers.0.post_attention_layernorm.weight
model.layers.1.self_attn.q_proj.weight
model.layers.1.self_attn.k_proj.weight
model.layers.1.self_attn.v_proj.weight
model.layers.1.self_attn.o_proj.weight
model.layers.1.mlp.gate_proj.weight
model.layers.1.mlp.up_proj.weight
model.layers.1.mlp.down_proj.weight
model.layers.1.input_layernorm.weight
model.layers.1.post_attention_layernorm.weight
model.layers.2.self_attn.q_proj.weight
model.layers.2.self_attn.k_proj.weight
model.layers.2.self_attn.v_proj.weight
model.layers.2.self_attn.o_proj.weight
model.layers.2.mlp.gate_proj.weight
model.layers.2.mlp.up_proj.weight
model.layers.2.mlp.down_proj.weight
model.layers.2.input_layernorm.weight
model.layers.2.post_attention_layernorm.weight
model.layers.3.self_attn.q_proj.weight
model.layers.3.self_attn.k_proj.weight
model.layers.3.self_attn.v_proj.weight
model.layers.3.self_attn.o_proj.weight
model.layers.3.mlp.gate_proj.weight
model.layers.3.mlp.up_proj.weight
model.layers.3.mlp.down_proj.weight
model.layers.3.input_layernorm.weight
model.layers.3.post_attention_layernorm.weight
model.layers.4.self_attn.q_proj.weight
model.layers.4.self_attn.k_proj.weight
model.layers.4.self_attn.v_proj.weight
model.layers.4.self_attn.o_proj.weight
model.layers.4.mlp.gate_proj.weight
model.layers.4.mlp.up_proj.weight
model.layers.4.mlp.down_proj.weight
model.layers.4.input_layernorm.weight
model.layers.4.post_attention_layernorm.weight
model.layers.5.self_attn.q_proj.weight
model.layers.5.self_attn.k_proj.weight
model.layers.5.self_attn.v_proj.weight
model.layers.5.self_attn.o_proj.weight
model.layers.5.mlp.gate_proj.weight
model.layers.5.mlp.up_proj.weight
model.layers.5.mlp.down_proj.weight
model.layers.5.input_layernorm.weight
model.layers.5.post_attention_layernorm.weight
model.layers.6.self_attn.q_proj.weight
model.layers.6.self_attn.k_proj.weight
model.layers.6.self_attn.v_proj.weight
model.layers.6.self_attn.o_proj.weight
model.layers.6.mlp.gate_proj.weight
model.layers.6.mlp.up_proj.weight
model.layers.6.mlp.down_proj.weight
model.layers.6.input_layernorm.weight
model.layers.6.post_attention_layernorm.weight
model.layers.7.self_attn.q_proj.weight
model.layers.7.self_attn.k_proj.weight
model.layers.7.self_attn.v_proj.weight
model.layers.7.self_attn.o_proj.weight
model.layers.7.mlp.gate_proj.weight
model.layers.7.mlp.up_proj.weight
model.layers.7.mlp.down_proj.weight
model.layers.7.input_layernorm.weight
model.layers.7.post_attention_layernorm.weight
model.layers.8.self_attn.q_proj.weight
model.layers.8.self_attn.k_proj.weight
model.layers.8.self_attn.v_proj.weight
model.layers.8.self_attn.o_proj.weight
model.layers.8.mlp.gate_proj.weight
model.layers.8.mlp.up_proj.weight
model.layers.8.mlp.down_proj.weight
model.layers.8.input_layernorm.weight
model.layers.8.post_attention_layernorm.weight
model.layers.9.self_attn.q_proj.weight
model.layers.9.self_attn.k_proj.weight
model.layers.9.self_attn.v_proj.weight
model.layers.9.self_attn.o_proj.weight
model.layers.9.mlp.gate_proj.weight
model.layers.9.mlp.up_proj.weight
model.layers.9.mlp.down_proj.weight
model.layers.9.input_layernorm.weight
model.layers.9.post_attention_layernorm.weight
model.layers.10.self_attn.q_proj.weight
model.layers.10.self_attn.k_proj.weight
model.layers.10.self_attn.v_proj.weight
model.layers.10.self_attn.o_proj.weight
model.layers.10.mlp.gate_proj.weight
model.layers.10.mlp.up_proj.weight
model.layers.10.mlp.down_proj.weight
model.layers.10.input_layernorm.weight
model.layers.10.post_attention_layernorm.weight
model.layers.11.self_attn.q_proj.weight
model.layers.11.self_attn.k_proj.weight
model.layers.11.self_attn.v_proj.weight
model.layers.11.self_attn.o_proj.weight
model.layers.11.mlp.gate_proj.weight
model.layers.11.mlp.up_proj.weight
model.layers.11.mlp.down_proj.weight
model.layers.11.input_layernorm.weight
model.layers.11.post_attention_layernorm.weight
model.layers.12.self_attn.q_proj.weight
model.layers.12.self_attn.k_proj.weight
model.layers.12.self_attn.v_proj.weight
model.layers.12.self_attn.o_proj.weight
model.layers.12.mlp.gate_proj.weight
model.layers.12.mlp.up_proj.weight
model.layers.12.mlp.down_proj.weight
model.layers.12.input_layernorm.weight
model.layers.12.post_attention_layernorm.weight
model.layers.13.self_attn.q_proj.weight
model.layers.13.self_attn.k_proj.weight
model.layers.13.self_attn.v_proj.weight
model.layers.13.self_attn.o_proj.weight
model.layers.13.mlp.gate_proj.weight
model.layers.13.mlp.up_proj.weight
model.layers.13.mlp.down_proj.weight
model.layers.13.input_layernorm.weight
model.layers.13.post_attention_layernorm.weight
model.layers.14.self_attn.q_proj.weight
model.layers.14.self_attn.k_proj.weight
model.layers.14.self_attn.v_proj.weight
model.layers.14.self_attn.o_proj.weight
model.layers.14.mlp.gate_proj.weight
model.layers.14.mlp.up_proj.weight
model.layers.14.mlp.down_proj.weight
model.layers.14.input_layernorm.weight
model.layers.14.post_attention_layernorm.weight
model.layers.15.self_attn.q_proj.weight
model.layers.15.self_attn.k_proj.weight
model.layers.15.self_attn.v_proj.weight
model.layers.15.self_attn.o_proj.weight
model.layers.15.mlp.gate_proj.weight
model.layers.15.mlp.up_proj.weight
model.layers.15.mlp.down_proj.weight
model.layers.15.input_layernorm.weight
model.layers.15.post_attention_layernorm.weight
model.layers.16.self_attn.q_proj.weight
model.layers.16.self_attn.k_proj.weight
model.layers.16.self_attn.v_proj.weight
model.layers.16.self_attn.o_proj.weight
model.layers.16.mlp.gate_proj.weight
model.layers.16.mlp.up_proj.weight
model.layers.16.mlp.down_proj.weight
model.layers.16.input_layernorm.weight
model.layers.16.post_attention_layernorm.weight
model.layers.17.self_attn.q_proj.weight
model.layers.17.self_attn.k_proj.weight
model.layers.17.self_attn.v_proj.weight
model.layers.17.self_attn.o_proj.weight
model.layers.17.mlp.gate_proj.weight
model.layers.17.mlp.up_proj.weight
model.layers.17.mlp.down_proj.weight
model.layers.17.input_layernorm.weight
model.layers.17.post_attention_layernorm.weight
model.layers.18.self_attn.q_proj.weight
model.layers.18.self_attn.k_proj.weight
model.layers.18.self_attn.v_proj.weight
model.layers.18.self_attn.o_proj.weight
model.layers.18.mlp.gate_proj.weight
model.layers.18.mlp.up_proj.weight
model.layers.18.mlp.down_proj.weight
model.layers.18.input_layernorm.weight
model.layers.18.post_attention_layernorm.weight
model.layers.19.self_attn.q_proj.weight
model.layers.19.self_attn.k_proj.weight
model.layers.19.self_attn.v_proj.weight
model.layers.19.self_attn.o_proj.weight
model.layers.19.mlp.gate_proj.weight
model.layers.19.mlp.up_proj.weight
model.layers.19.mlp.down_proj.weight
model.layers.19.input_layernorm.weight
model.layers.19.post_attention_layernorm.weight
model.layers.20.self_attn.q_proj.weight
model.layers.20.self_attn.k_proj.weight
model.layers.20.self_attn.v_proj.weight
model.layers.20.self_attn.o_proj.weight
model.layers.20.mlp.gate_proj.weight
model.layers.20.mlp.up_proj.weight
model.layers.20.mlp.down_proj.weight
model.layers.20.input_layernorm.weight
model.layers.20.post_attention_layernorm.weight
model.layers.21.self_attn.q_proj.weight
model.layers.21.self_attn.k_proj.weight
model.layers.21.self_attn.v_proj.weight
model.layers.21.self_attn.o_proj.weight
model.layers.21.mlp.gate_proj.weight
model.layers.21.mlp.up_proj.weight
model.layers.21.mlp.down_proj.weight
model.layers.21.input_layernorm.weight
model.layers.21.post_attention_layernorm.weight
model.layers.22.self_attn.q_proj.weight
model.layers.22.self_attn.k_proj.weight
model.layers.22.self_attn.v_proj.weight
model.layers.22.self_attn.o_proj.weight
model.layers.22.mlp.gate_proj.weight
model.layers.22.mlp.up_proj.weight
model.layers.22.mlp.down_proj.weight
model.layers.22.input_layernorm.weight
model.layers.22.post_attention_layernorm.weight
model.layers.23.self_attn.q_proj.weight
model.layers.23.self_attn.k_proj.weight
model.layers.23.self_attn.v_proj.weight
model.layers.23.self_attn.o_proj.weight
model.layers.23.mlp.gate_proj.weight
model.layers.23.mlp.up_proj.weight
model.layers.23.mlp.down_proj.weight
model.layers.23.input_layernorm.weight
model.layers.23.post_attention_layernorm.weight
model.layers.24.self_attn.q_proj.weight
model.layers.24.self_attn.k_proj.weight
model.layers.24.self_attn.v_proj.weight
model.layers.24.self_attn.o_proj.weight
model.layers.24.mlp.gate_proj.weight
model.layers.24.mlp.up_proj.weight
model.layers.24.mlp.down_proj.weight
model.layers.24.input_layernorm.weight
model.layers.24.post_attention_layernorm.weight
model.layers.25.self_attn.q_proj.weight
model.layers.25.self_attn.k_proj.weight
model.layers.25.self_attn.v_proj.weight
model.layers.25.self_attn.o_proj.weight
model.layers.25.mlp.gate_proj.weight
model.layers.25.mlp.up_proj.weight
model.layers.25.mlp.down_proj.weight
model.layers.25.input_layernorm.weight
model.layers.25.post_attention_layernorm.weight
model.layers.26.self_attn.q_proj.weight
model.layers.26.self_attn.k_proj.weight
model.layers.26.self_attn.v_proj.weight
model.layers.26.self_attn.o_proj.weight
model.layers.26.mlp.gate_proj.weight
model.layers.26.mlp.up_proj.weight
model.layers.26.mlp.down_proj.weight
model.layers.26.input_layernorm.weight
model.layers.26.post_attention_layernorm.weight
model.layers.27.self_attn.q_proj.weight
model.layers.27.self_attn.k_proj.weight
model.layers.27.self_attn.v_proj.weight
model.layers.27.self_attn.o_proj.weight
model.layers.27.mlp.gate_proj.weight
model.layers.27.mlp.up_proj.weight
model.layers.27.mlp.down_proj.weight
model.layers.27.input_layernorm.weight
model.layers.27.post_attention_layernorm.weight
model.layers.28.self_attn.q_proj.weight
model.layers.28.self_attn.k_proj.weight
model.layers.28.self_attn.v_proj.weight
model.layers.28.self_attn.o_proj.weight
model.layers.28.mlp.gate_proj.weight
model.layers.28.mlp.up_proj.weight
model.layers.28.mlp.down_proj.weight
model.layers.28.input_layernorm.weight
model.layers.28.post_attention_layernorm.weight
model.layers.29.self_attn.q_proj.weight
model.layers.29.self_attn.k_proj.weight
model.layers.29.self_attn.v_proj.weight
model.layers.29.self_attn.o_proj.weight
model.layers.29.mlp.gate_proj.weight
model.layers.29.mlp.up_proj.weight
model.layers.29.mlp.down_proj.weight
model.layers.29.input_layernorm.weight
model.layers.29.post_attention_layernorm.weight
model.layers.30.self_attn.q_proj.weight
model.layers.30.self_attn.k_proj.weight
model.layers.30.self_attn.v_proj.weight
model.layers.30.self_attn.o_proj.weight
model.layers.30.mlp.gate_proj.weight
model.layers.30.mlp.up_proj.weight
model.layers.30.mlp.down_proj.weight
model.layers.30.input_layernorm.weight
model.layers.30.post_attention_layernorm.weight
model.layers.31.self_attn.q_proj.weight
model.layers.31.self_attn.k_proj.weight
model.layers.31.self_attn.v_proj.weight
model.layers.31.self_attn.o_proj.weight
model.layers.31.mlp.gate_proj.weight
model.layers.31.mlp.up_proj.weight
model.layers.31.mlp.down_proj.weight
model.layers.31.input_layernorm.weight
model.layers.31.post_attention_layernorm.weight
model.norm.weight
lm_head.weight
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [00:00<00:00, 315141.35it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [00:01<00:00, 215.24it/s]
Traceback (most recent call last):
File "merge_weight_en.py", line 181, in
fire.Fire(main)
File "/home/dk/anaconda3/envs/moss/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/dk/anaconda3/envs/moss/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/dk/anaconda3/envs/moss/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "merge_weight_en.py", line 177, in main
globals()task
File "/home/dk/anaconda3/envs/moss/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "merge_weight_en.py", line 165, in recover
assert torch.allclose(
AssertionError: Naive integrity check failed. This could imply that some of the checkpoint files are corrupted.

@red-tie
Copy link
Author

red-tie commented Oct 7, 2023

上面是报错详细信息

@yuanhuachao
Copy link

+1,我也遇到了

@red-tie
Copy link
Author

red-tie commented Oct 9, 2023

已经解决了,把requirements里面的包装一下就可以。命令行运行pip install -r requirements, 里面那个flash-attn可能会报错,我直接把那个删了,希望后续不会有问题。。。

@Ablustrund
Copy link
Collaborator

您好,这个之前有同学遇到过,确实可能是包的版本问题。本质上是加载模型时候有一些层被加载(例如旋转位置编码),但是在merge的时候不需要用到这些层。
debug的方式查看diff中有哪些层,是否和原始的llama相对应。

@Ablustrund
Copy link
Collaborator

已经解决了,把requirements里面的包装一下就可以。命令行运行pip install -r requirements, 里面那个flash-attn可能会报错,我直接把那个删了,希望后续不会有问题。。。

flash-attn需要外网环境,我们这里安装时也不太稳定,多安装几次会成功。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants