Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Llama3] Debug accuracy drop on sequence lengths > 64k tokens #13192

Closed
mtairum opened this issue Sep 27, 2024 · 1 comment
Closed

[Llama3] Debug accuracy drop on sequence lengths > 64k tokens #13192

mtairum opened this issue Sep 27, 2024 · 1 comment

Comments

@mtairum
Copy link
Contributor

mtairum commented Sep 27, 2024

Status:

Prefill seqlen up to 128k supported in Llama3 codebase (1b, 3B, 8B, 11B, 70B), but bad accuracy in some corner cases:

  • ✅ All models show good output for prefill seqlen up to 32K tokens.
  • ❌ All models showing bad (repetitive) output for 128K tokens.
  • ✅ With top-p sampling, prefill up to 64K is good in most models.
    • ❌ Exception 1B/3B

Things we've tried so far:

  • TODO
@mtairum mtairum self-assigned this Sep 27, 2024
@mtairum mtairum removed the community label Sep 30, 2024
@mtairum mtairum changed the title [Llama3.1-8B] Support max prefill lengths on demos (within L1 capacity) [Llama3] Debug accuracy drop on sequence lengths > 64k tokens Dec 2, 2024
@mtairum mtairum added P1 and removed P0 labels Dec 2, 2024
@mtairum
Copy link
Contributor Author

mtairum commented Dec 6, 2024

This is now supported in main.

Added a P2 followup for small Llama models here: #15737

@mtairum mtairum closed this as completed Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants