Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model request: Phi 3 mini 128K #432

Closed
flatsiedatsie opened this issue May 29, 2024 · 6 comments
Closed

Model request: Phi 3 mini 128K #432

flatsiedatsie opened this issue May 29, 2024 · 6 comments

Comments

@flatsiedatsie
Copy link

Seems like a good match for WebLLM, as it was practicaly designed to run in the browser.

From this reddit thread:
https://www.reddit.com/r/LocalLLaMA/comments/1d2o445/comment/l63cvxk/

@CharlieFRuan
Copy link
Contributor

Phi3-mini, StableLM 1.6B, Qwen 1.8B were just added to the prebuilt list here: #433

Will bump the version to 0.2.39 soon.

Note the phi3 we added was 4k instead of 128K.

If I understand correctly, to support 128K context length, we need to allocate a KV cache that is 128K on the sequence dimension, which yields to head_dim * num_layer * num_kv_heads * {k,v} * size(f16) * 128K bytes, i.e. 96 * 32 * 32 * 2 * 2 * 128000, which is 46GB, as opposed to 1.5GB for 4k context length.

@CharlieFRuan
Copy link
Contributor

Just published 0.2.39; those models are now included in the prebuilt app config!

@flatsiedatsie
Copy link
Author

Very nice.

If you don't mind I'll keep this open for now? I think the 128K context version would still offer something valueable to WebLLM.

@CharlieFRuan
Copy link
Contributor

npm 0.2.62 now supports Phi3.5-mini: #556

Phi-3.5-mini comes with support up to 128K context (unlike Phi-3-mini which only has 4k) thanks to rope scaling which MLC-LLM supports, which you can take advantage of in WebLLM by increasing ModelRecord.overrides.context_window_size or specifying it in ChatOptions when loading a model, as long as there is enough VRAM.

@CharlieFRuan
Copy link
Contributor

Closing this issue for now as Phi-3.5 should suffice the need described. Feel free to open new ones if new issues arise!

@flatsiedatsie
Copy link
Author

Brilliant, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@flatsiedatsie @CharlieFRuan and others