Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlx_lm with llama-3.3-70b-instruct works like base model in some case. #1162

Open
chigkim opened this issue Dec 15, 2024 · 2 comments
Open

Comments

@chigkim
Copy link

chigkim commented Dec 15, 2024

My prompt looks like this:

Provide a summary as well as a detail analysis of the following:
Then content to summarize goes next.

However, if I run the following,

mlx_lm.generate --model mlx-community/Llama-3.3-70B-Instruct-4bit --max-kv-size 30000 --max-tokens 2000 --temp 0.0 --top-p 0.9 --seed 1000 --system 'You are a helpful assistant' --prompt -<./28000.txt

I only get this:

"I hope this information has been helpful. If you have any further questions or need more information, please don't hesitate to ask."

I'm attaching the full prompt below.

28000.txt

Thanks!

@awni
Copy link
Member

awni commented Dec 17, 2024

That's odd. Does it still fail if you don't specify --max-kv-size?

Is it just for that prompt or do you observe the same for shorter prompts? What about other Llama models or just the 70B?

@chigkim
Copy link
Author

chigkim commented Dec 17, 2024

I discovered this when I created a script to test speed with various prompts lengths.

What's interesting is that when feeding 28k, 30k, 32k, it has the same problem where it only generates 27 tokens with the same phrase. When feeding Prompts with 26k tokens and less, it didn't have the problem.

I'm suspecting something might be going with long context? It's like opposite of the issues I created for looping problem with long context and llama-3.1-8b-instruct-4bit.

I'll test some more with what you suggested, and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants