mlx_lm with llama-3.3-70b-instruct works like base model in some case. #1162

chigkim · 2024-12-15T20:34:43Z

My prompt looks like this:

Provide a summary as well as a detail analysis of the following:
Then content to summarize goes next.

However, if I run the following,

mlx_lm.generate --model mlx-community/Llama-3.3-70B-Instruct-4bit --max-kv-size 30000 --max-tokens 2000 --temp 0.0 --top-p 0.9 --seed 1000 --system 'You are a helpful assistant' --prompt -<./28000.txt

I only get this:

"I hope this information has been helpful. If you have any further questions or need more information, please don't hesitate to ask."

I'm attaching the full prompt below.

28000.txt

Thanks!

awni · 2024-12-17T17:53:43Z

That's odd. Does it still fail if you don't specify --max-kv-size?

Is it just for that prompt or do you observe the same for shorter prompts? What about other Llama models or just the 70B?

chigkim · 2024-12-17T23:25:39Z

I discovered this when I created a script to test speed with various prompts lengths.

What's interesting is that when feeding 28k, 30k, 32k, it has the same problem where it only generates 27 tokens with the same phrase. When feeding Prompts with 26k tokens and less, it didn't have the problem.

I'm suspecting something might be going with long context? It's like opposite of the issues I created for looping problem with long context and llama-3.1-8b-instruct-4bit.

I'll test some more with what you suggested, and report back.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mlx_lm with llama-3.3-70b-instruct works like base model in some case. #1162

mlx_lm with llama-3.3-70b-instruct works like base model in some case. #1162

chigkim commented Dec 15, 2024

awni commented Dec 17, 2024

chigkim commented Dec 17, 2024

mlx_lm with llama-3.3-70b-instruct works like base model in some case. #1162

mlx_lm with llama-3.3-70b-instruct works like base model in some case. #1162

Comments

chigkim commented Dec 15, 2024

awni commented Dec 17, 2024

chigkim commented Dec 17, 2024