[Llama3] Debug accuracy drop on sequence lengths > 64k tokens #13192

mtairum · 2024-09-27T08:58:15Z

Status:

Prefill seqlen up to 128k supported in Llama3 codebase (1b, 3B, 8B, 11B, 70B), but bad accuracy in some corner cases:

✅ All models show good output for prefill seqlen up to 32K tokens.
❌ All models showing bad (repetitive) output for 128K tokens.
✅ With top-p sampling, prefill up to 64K is good in most models.
- ❌ Exception 1B/3B

Things we've tried so far:

TODO

mtairum · 2024-12-06T11:03:11Z

This is now supported in main.

Added a P2 followup for small Llama models here: #15737

mtairum added P0 LLMs on Metal llama3 labels Sep 27, 2024

mtairum self-assigned this Sep 27, 2024

github-actions bot added the community label Sep 27, 2024

mtairum mentioned this issue Sep 27, 2024

[Llama3.1-8B] Add support for batch sizes up to 32 #11997

Closed

mtairum removed the community label Sep 30, 2024

mtairum mentioned this issue Sep 30, 2024

Update Mistral-7B demo up to max context length on single device #11951

Open

mtairum changed the title ~~[Llama3.1-8B] Support max prefill lengths on demos (within L1 capacity)~~ [Llama3] Debug accuracy drop on sequence lengths > 64k tokens Dec 2, 2024

mtairum added P1 and removed P0 labels Dec 2, 2024

mtairum assigned sraizada-tt Dec 2, 2024

mtairum closed this as completed Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Llama3] Debug accuracy drop on sequence lengths > 64k tokens #13192

[Llama3] Debug accuracy drop on sequence lengths > 64k tokens #13192

mtairum commented Sep 27, 2024 •

edited

Loading

mtairum commented Dec 6, 2024

[Llama3] Debug accuracy drop on sequence lengths > 64k tokens #13192

[Llama3] Debug accuracy drop on sequence lengths > 64k tokens #13192

Comments

mtairum commented Sep 27, 2024 • edited Loading

Status:

Things we've tried so far:

mtairum commented Dec 6, 2024

mtairum commented Sep 27, 2024 •

edited

Loading