ColossalllaMA-2-7b-base: 为什么咱们添加了少量中文数据之后,在英文的MMLU上增量这么大呀? #4868
tomyoung903
started this conversation in
Development | Core
Replies: 4 comments 2 replies
-
你好,首先,感谢对于 Colossal-LLaMA-2 的关注。 我们在增量预训练阶段,不仅仅添加了中文数据,还有少量的英文数据,主要用于 replay 的作用,缓解模型的灾难性遗忘的问题。这部分数据经过精心的筛选,以求最大程度的唤醒模型在预训练第一阶段(LLaMA-2)学到的知识。 |
Beta Was this translation helpful? Give feedback.
0 replies
-
That seems like a major new algorithm to me. Do you plan on open-sourcing
the whole training process?
Is there gonna be a detailed explanation on this on a paper/blog in the
future?
Cheers,
Tom Young
tomyoung903.github.io
…On Sun, Oct 8, 2023 at 10:22 PM Tong Li ***@***.***> wrote:
你好,首先,感谢对于 Colossal-LLaMA-2 的关注。
我们在增量预训练阶段,不仅仅添加了中文数据,还有少量的英文数据,主要用于 replay
的作用,缓解模型的灾难性遗忘的问题。这部分数据经过精心的筛选,以求最大程度的唤醒模型在预训练第一阶段(LLaMA-2)学到的知识。
—
Reply to this email directly, view it on GitHub
<#4868 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKQ3UYEMEMTT26L5R2CNRDLX6KZLTAVCNFSM6AAAAAA5W4P3J2VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TEMRTGE3DG>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
-
另外为啥如果只是防止遗忘/唤醒的话,为啥比遗忘前好了这么多 |
Beta Was this translation helpful? Give feedback.
1 reply
-
Thanks! Look forward to your report! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
为什么咱们添加了少量中文数据之后,在英文的MMLU上增量这么大呀?
潞晨科技公众号推送 《千元预算半天训练,效果媲美主流大模型,开源可商用中文LLaMA-2》
https://mp.weixin.qq.com/s/25r6hJqNDQhqR4EHu0uctA
Beta Was this translation helpful? Give feedback.
All reactions