Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minillm] what is the PROMPT_DATA_DIR and LM_DATA_DIR #288

Open
Harryjun opened this issue Dec 16, 2024 · 3 comments
Open

[minillm] what is the PROMPT_DATA_DIR and LM_DATA_DIR #288

Harryjun opened this issue Dec 16, 2024 · 3 comments

Comments

@Harryjun
Copy link

[minillm] what is the PROMPT_DATA_DIR and LM_DATA_DIR ? LM_DATA_DIR is required?

@t1101675
Copy link
Contributor

t1101675 commented Dec 17, 2024

PROMPT_DATA_DIR contains the prompts in $\mathcal{D}$, the conditional generation dataset.
LM_DATA_DIR contains the data in $\mathcal{D}_{\text{PT}}$, the long-document texts.
Details of $\mathcal{D}$ and $\mathcal{D}_{\text{PT}}$ can be found in our paper

@Harryjun
Copy link
Author

@t1101675 展开说说?我这里只有一份SFT数据,训了一个14b和3b想把14b效果蒸馏到3b可行吧?
SFT数据我理解就是PROMPT_DATA_DIR?

@t1101675
Copy link
Contributor

可行。可以参考此文件运行。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants