Bad output after 80k tokens: rope scaling_factor 8 is used for 1B and 3B 3.2 models #241

yieldthought · 2024-12-10T11:03:15Z

The rope scaling factor is hard-coded to 8 but the correct value for the 1B and 3B 3.2-series models is 32, as updated on the huggingface repo in this post-release fix.

The current value of 8 is correct for 3.1 models, for 11B and 90B 3.2 and for 70B 3.3, so this should be taken from the model config setup and not hard-coded.

This is not apparent at short sequence lengths but causes a dramatic degradation around 80k tokens.

yieldthought · 2024-12-10T13:17:42Z

I notice #166 was closed without a resolution; @varunfb the current implementation is just wrong for the 1B and 3B models.

mtairum · 2024-12-13T14:37:45Z

Merged with PR tenstorrent/tt-metal#15909

Can now be closed.

yieldthought mentioned this issue Dec 10, 2024

[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU tenstorrent/tt-metal#15737

Closed

yieldthought changed the title ~~rope scaling_factor 8 is still used for 3.2+ models~~ Bad output after 80k tokens: rope scaling_factor 8 is still used for 3.2+ models Dec 10, 2024

yieldthought changed the title ~~Bad output after 80k tokens: rope scaling_factor 8 is still used for 3.2+ models~~ Bad output after 80k tokens: rope scaling_factor 8 is still used for 1B and 3B 3.2 models Dec 10, 2024

yieldthought changed the title ~~Bad output after 80k tokens: rope scaling_factor 8 is still used for 1B and 3B 3.2 models~~ Bad output after 80k tokens: rope scaling_factor 8 is used for 1B and 3B 3.2 models Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad output after 80k tokens: rope scaling_factor 8 is used for 1B and 3B 3.2 models #241

Bad output after 80k tokens: rope scaling_factor 8 is used for 1B and 3B 3.2 models #241

yieldthought commented Dec 10, 2024 •

edited

Loading

yieldthought commented Dec 10, 2024 •

edited

Loading

mtairum commented Dec 13, 2024

Bad output after 80k tokens: rope scaling_factor 8 is used for 1B and 3B 3.2 models #241

Bad output after 80k tokens: rope scaling_factor 8 is used for 1B and 3B 3.2 models #241

Comments

yieldthought commented Dec 10, 2024 • edited Loading

yieldthought commented Dec 10, 2024 • edited Loading

mtairum commented Dec 13, 2024

yieldthought commented Dec 10, 2024 •

edited

Loading

yieldthought commented Dec 10, 2024 •

edited

Loading