-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of documentation regarding RoPE scaling #2402
Comments
What do you mean ? |
In PR #2295 there is a graph of the
where When you go beyond 4 times the training context size, things get out of control very easily (very high perplexity) unless you find the magical combination of |
Thank you for asking about this. I've also been poring over the PRs trying to sort out optimal values to use. There are a lot of different versions of NTK scaling floating around at the moment, and they're all more complex than the linear RoPE scaling implemented in Exllama. In my case I'm interested in applying the scaling to 65b and 70b models and I haven't seen a lot of guidance for those. Would also appreciate some documentation on the usage of CFG. I've read the paper but I'm still a little unclear on how to maximise the implementation of it in llama.cpp. |
CFG amplifies the prompt output by substracting a similar prompt's outcome that you don't want. You can see in #2217 some funny examples. |
Is there any guidance on how to set the Rope parameters for 30b+ models? |
Added a parameter |
Is there any documentation on how to implement this or an example? I am kind of new in the field and I am fine tuning code llama 2 and I want to increase the context length. But between all these posts I am sort of confused how to implement it actually |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
s this parameter rope_scale and the parametere rope_freq_scale available in the Langchain's extension of LlamaCPP, the same? |
There is a lack of documentation on the current development.
There is no documentation on the rope parameters, except for the two lines in the --help command that say:
There is no mention of RoPE scaling in the primary readme.md, no mention of the new parameters in the readme pages of the "main" example or the "server" example or any of the now linked pages in the "docs" section of the primary readme, and no mention in the new wiki pages.
So it is very hard for a normal user to even notice that he has missed something.
When a user is ready, the only explanation that seems to be available at the moment is PR #2054.
But what are actually reasonable values for scale and base, that requires a lot of reading - the first concrete suggestion is explicitly
For the bold, try adding the following command line parameters to your favorite model: -c 16384 --rope-freq-base 80000 --rope-freq-scale 0.5
What about the not-so bold?
In the course of the PR there were numerous combinations of the parameters base and scale, and I also experimented with recommended combinations.
But reasonably clear descriptions of what values would be recommended in which dependencies to each other, and perhaps also in relation to Llama 2 - that is not really to find and if, then it requires considerable search effort.
RoPE Scaling is a clear extension of the possibilities of Llama - shouldn't there be some form of documentation for it?
The text was updated successfully, but these errors were encountered: