Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the different datasets and corresponding models #176

Open
Statisticss opened this issue Feb 2, 2024 · 0 comments
Open

About the different datasets and corresponding models #176

Statisticss opened this issue Feb 2, 2024 · 0 comments

Comments

@Statisticss
Copy link

Statisticss commented Feb 2, 2024

Thanks for this great work! I have several questions regarding the datasets and the corresponding models:

Q1:
I think you have used RedPajama for FT and LongAlpaca-12k for SFT. You mentioned that there is no need to do FT before SFT. So can I directly use, e.g., llama2-7B-chat-hf, to do SFT using the LongAlpaca-12k dataset?

Q2:
If the performance of SFT-only models is already good enough, what's the purpose of doing FT using RedPajama? I mean, FT using RedPajama would be much more time-consuming compared to SFT, right?

Q3:
In your paper, I didn't see much results on the evaluations of the SFT-only models. Most evaluations are conducted on the FT models. Would those results on SFT-only models be added to the paper later?

Q4:
On 2023.11.19, you released several models fine-tuned on the dataset LongAlpaca-16k-length. What's the difference between LongAlpaca-16k-length and LongAlpaca-12k? Will I get the same model by using LongAlpaca-12k dataset if set the --model_max_length to be 16384?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant