Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(mlp.py): swap mlp w1w2w3 init order to w1w3w2 and fix QA #384

Merged
merged 6 commits into from
Dec 6, 2024

Conversation

li126com
Copy link
Collaborator

@li126com li126com commented Dec 3, 2024

  1. mlp.py,基于模型的执行顺序来调整模型的初始化顺序,这样才能使得isp overlap时prefetch module weight的顺序是对的。
  2. QA修改。由于交换w2, w3初始化顺序会导致权重随机初始化结果发生变化,需要重新适配QA代码 baseline。此外将涉及到的QA代码从 internlm1 重新适配到 internlm2。
  3. modeling_internlm2 添加函数 load_internlm2_with_dynamic_parallel_size 用于 test_loss 动态加载模型权重来统一初始化不同测试 case。
  4. 修复 attention.py 中代码小 bug 。

@li126com li126com changed the title fix test loss fix(QA): swap w2 and w3 init, change qa test from Internlm1 to Internlm2 Dec 4, 2024
@li126com li126com changed the title fix(QA): swap w2 and w3 init, change qa test from Internlm1 to Internlm2 fix(mlp.py): fix mlp w1w2w3 init order to w1w3w2 and fix test Dec 6, 2024
@li126com li126com changed the title fix(mlp.py): fix mlp w1w2w3 init order to w1w3w2 and fix test fix(mlp.py): swap mlp w1w2w3 init order to w1w3w2 and fix test Dec 6, 2024
@li126com li126com changed the title fix(mlp.py): swap mlp w1w2w3 init order to w1w3w2 and fix test fix(mlp.py): swap mlp w1w2w3 init order to w1w3w2 and fix QA Dec 6, 2024
@sunpengsdu sunpengsdu merged commit f6c66bd into InternLM:develop Dec 6, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants