[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU #15737

mtairum · 2024-12-05T11:16:33Z

For the 2 smallest llama3 models, 1B and 3B, we see bad output when prefill seqlen is very large, i..e > 64K.

We've debug extensively and cannot find a reason why this happens for these 2 small models, when it works fine for 8B up to 70B models.

Our best guess is that due to the small size of these models, long context lengths become very sensitive to accuracy.

TODO

Test the reference 1B / 3B model on CPU or GPU, using the same prompt and look at the output. If the reference is also weak, we can close this issue. Otherwise we'll revisit this.

yieldthought · 2024-12-06T17:48:12Z

ref.log

1B reference model loses accuracy sharply after around 80k tokens

yieldthought · 2024-12-10T09:56:18Z

3B follows a similar pattern

yieldthought · 2024-12-10T10:29:37Z

I was a bit concerned it doesn't really make sense that Meta would release the models in this state and wondered if there's something wrong with the version of the reference we have or the way we are using it e.g. incorrectly propagated rope settings or similar. So I modified the 1b test to run using the huggingface model and tokenizer - it is fine:

yieldthought · 2024-12-10T10:34:57Z

We're using the 3.1 reference codebase which hard-codes a scaling factor of 8, but 3.2 (as seen in hf below) actually uses 32:

yieldthought · 2024-12-10T10:58:30Z

The official Meta Llama 3 repo uses a scale_factor of 8 for all the models, including 3.2:
https://github.com/meta-llama/llama-models/blob/fc1e70e7970bdf599a924f6fd06cedb6e3819224/models/llama3/reference_impl/model.py#L47

yieldthought · 2024-12-10T11:00:21Z

The Meta Llama 3 huggingface repo config.json was updated from 8 to 32 a couple of weeks after release:
https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/commit/c4219cc9e642e492fd0219283fa3c674804bb8ed

It appears this was a bugfix and they did not update their own official GitHub repository.

yieldthought · 2024-12-10T11:05:17Z

Notified the Meta repo so they can fix this on their end too.

This reverts commit d04b1ba.

mtairum · 2024-12-13T14:34:13Z

Merged in PR #15909

Thanks @yieldthought

mtairum added llama3 P2 labels Dec 5, 2024

mtairum self-assigned this Dec 5, 2024

mtairum mentioned this issue Dec 6, 2024

[Llama3] Debug accuracy drop on sequence lengths > 64k tokens #13192

Closed

yieldthought mentioned this issue Dec 10, 2024

Bad output after 80k tokens: rope scaling_factor 8 is used for 1B and 3B 3.2 models meta-llama/llama-models#241

Open

mtairum assigned yieldthought and unassigned mtairum Dec 10, 2024

yieldthought added a commit that referenced this issue Dec 10, 2024

#15737: Fix rope scaling factor for 1B and 3B models, improve acc test

5152741

yieldthought added a commit that referenced this issue Dec 10, 2024

#15737: Fix rope scaling factor for 1B and 3B models, improve acc test

4c6d74e

yieldthought added a commit that referenced this issue Dec 11, 2024

#15737: Update accuracy tables and improve lt accuracy table gen

dc8acd8

yieldthought mentioned this issue Dec 11, 2024

Fix Llama rope scaling factor, improve accuracy #15909

Merged

3 tasks

yieldthought added a commit that referenced this issue Dec 11, 2024

#15737: Update gitmodules to point to branch

d04b1ba

yieldthought added a commit that referenced this issue Dec 11, 2024

#15737: Fixed submodule reference

b42ddcc

yieldthought added a commit that referenced this issue Dec 11, 2024

Revert "#15737: Update gitmodules to point to branch"

6c65b4c

This reverts commit d04b1ba.

yieldthought added a commit that referenced this issue Dec 11, 2024

#15737: Fix rope scaling factor for 1B and 3B models, improve acc test

34dc167

mtairum pushed a commit that referenced this issue Dec 13, 2024

#15737: Fix rope scaling factor for 1B and 3B models, improve acc test

c6d5e8f

mtairum closed this as completed in #15909 Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU #15737

[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU #15737

mtairum commented Dec 5, 2024

yieldthought commented Dec 6, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

mtairum commented Dec 13, 2024

[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU #15737

[LLama3] Seqlen 128k for Llama3.2 1B and 3B - Reproduce on reference GPU #15737

Comments

mtairum commented Dec 5, 2024

TODO

yieldthought commented Dec 6, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

yieldthought commented Dec 10, 2024

mtairum commented Dec 13, 2024