-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU #15737
Comments
1B reference model loses accuracy sharply after around 80k tokens |
I was a bit concerned it doesn't really make sense that Meta would release the models in this state and wondered if there's something wrong with the version of the reference we have or the way we are using it e.g. incorrectly propagated rope settings or similar. So I modified the 1b test to run using the huggingface model and tokenizer - it is fine: |
The official Meta Llama 3 repo uses a scale_factor of 8 for all the models, including 3.2: |
The Meta Llama 3 huggingface repo config.json was updated from 8 to 32 a couple of weeks after release: It appears this was a bugfix and they did not update their own official GitHub repository. |
Notified the Meta repo so they can fix this on their end too. |
Merged in PR #15909 Thanks @yieldthought |
For the 2 smallest llama3 models, 1B and 3B, we see bad output when prefill seqlen is very large, i..e > 64K.
We've debug extensively and cannot find a reason why this happens for these 2 small models, when it works fine for 8B up to 70B models.
Our best guess is that due to the small size of these models, long context lengths become very sensitive to accuracy.
TODO
The text was updated successfully, but these errors were encountered: