You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this great work! I have several questions regarding the datasets and the corresponding models:
Q1:
I think you have used RedPajama for FT and LongAlpaca-12k for SFT. You mentioned that there is no need to do FT before SFT. So can I directly use, e.g., llama2-7B-chat-hf, to do SFT using the LongAlpaca-12k dataset?
Q2:
If the performance of SFT-only models is already good enough, what's the purpose of doing FT using RedPajama? I mean, FT using RedPajama would be much more time-consuming compared to SFT, right?
Q3:
In your paper, I didn't see much results on the evaluations of the SFT-only models. Most evaluations are conducted on the FT models. Would those results on SFT-only models be added to the paper later?
Q4:
On 2023.11.19, you released several models fine-tuned on the dataset LongAlpaca-16k-length. What's the difference between LongAlpaca-16k-length and LongAlpaca-12k? Will I get the same model by using LongAlpaca-12k dataset if set the --model_max_length to be 16384?
The text was updated successfully, but these errors were encountered:
Thanks for this great work! I have several questions regarding the datasets and the corresponding models:
Q1:
I think you have used RedPajama for FT and LongAlpaca-12k for SFT. You mentioned that there is no need to do FT before SFT. So can I directly use, e.g., llama2-7B-chat-hf, to do SFT using the LongAlpaca-12k dataset?
Q2:
If the performance of SFT-only models is already good enough, what's the purpose of doing FT using RedPajama? I mean, FT using RedPajama would be much more time-consuming compared to SFT, right?
Q3:
In your paper, I didn't see much results on the evaluations of the SFT-only models. Most evaluations are conducted on the FT models. Would those results on SFT-only models be added to the paper later?
Q4:
On 2023.11.19, you released several models fine-tuned on the dataset LongAlpaca-16k-length. What's the difference between LongAlpaca-16k-length and LongAlpaca-12k? Will I get the same model by using LongAlpaca-12k dataset if set the --model_max_length to be 16384?
The text was updated successfully, but these errors were encountered: