About the different datasets and corresponding models #176

Statisticss · 2024-02-02T08:02:07Z

Thanks for this great work! I have several questions regarding the datasets and the corresponding models:

Q1:
I think you have used RedPajama for FT and LongAlpaca-12k for SFT. You mentioned that there is no need to do FT before SFT. So can I directly use, e.g., llama2-7B-chat-hf, to do SFT using the LongAlpaca-12k dataset?

Q2:
If the performance of SFT-only models is already good enough, what's the purpose of doing FT using RedPajama? I mean, FT using RedPajama would be much more time-consuming compared to SFT, right?

Q3:
In your paper, I didn't see much results on the evaluations of the SFT-only models. Most evaluations are conducted on the FT models. Would those results on SFT-only models be added to the paper later?

Q4:
On 2023.11.19, you released several models fine-tuned on the dataset LongAlpaca-16k-length. What's the difference between LongAlpaca-16k-length and LongAlpaca-12k? Will I get the same model by using LongAlpaca-12k dataset if set the --model_max_length to be 16384?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the different datasets and corresponding models #176

About the different datasets and corresponding models #176

Statisticss commented Feb 2, 2024 •

edited

Loading

About the different datasets and corresponding models #176

About the different datasets and corresponding models #176

Comments

Statisticss commented Feb 2, 2024 • edited Loading

Statisticss commented Feb 2, 2024 •

edited

Loading