You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The rope scaling factor is hard-coded to 8 but the correct value for the 1B and 3B 3.2-series models is 32, as updated on the huggingface repo in this post-release fix.
The current value of 8 is correct for 3.1 models, for 11B and 90B 3.2 and for 70B 3.3, so this should be taken from the model config setup and not hard-coded.
yieldthought
changed the title
rope scaling_factor 8 is still used for 3.2+ models
Bad output after 80k tokens: rope scaling_factor 8 is still used for 3.2+ models
Dec 10, 2024
I notice #166 was closed without a resolution; @varunfb the current implementation is just wrong for the 1B and 3B models.
yieldthought
changed the title
Bad output after 80k tokens: rope scaling_factor 8 is still used for 3.2+ models
Bad output after 80k tokens: rope scaling_factor 8 is still used for 1B and 3B 3.2 models
Dec 10, 2024
yieldthought
changed the title
Bad output after 80k tokens: rope scaling_factor 8 is still used for 1B and 3B 3.2 models
Bad output after 80k tokens: rope scaling_factor 8 is used for 1B and 3B 3.2 models
Dec 10, 2024
The rope scaling factor is hard-coded to 8 but the correct value for the 1B and 3B 3.2-series models is 32, as updated on the huggingface repo in this post-release fix.
The current value of 8 is correct for 3.1 models, for 11B and 90B 3.2 and for 70B 3.3, so this should be taken from the model config setup and not hard-coded.
This is not apparent at short sequence lengths but causes a dramatic degradation around 80k tokens.
The text was updated successfully, but these errors were encountered: