Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex. #774

Open
sopaco opened this issue Sep 15, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@sopaco
Copy link

sopaco commented Sep 15, 2024

Describe the bug

Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex.
Even to the point of freezing, the GPU usage is high but it cannot provide completions results.
the model:https://huggingface.co/cleatherbury/Phi-3-mini-128k-instruct-Q4_K_M-GGUF

[2024-09-15 09:11:00.088] [info] Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
[2024-09-15 09:11:00.088] [info] Model kind is: quantized from gguf (no adapters)
general.architecture: phi3
general.basename: Phi-3
general.file_type: 7
general.finetune: 128k-instruct
general.languages: en
general.license: mit
general.license.link: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE
general.name: Phi 3 Mini 128k Instruct
general.quantization_version: 2
general.size_label: mini
general.tags: nlp, code, text-generation
general.type: model
phi3.attention.head_count: 32
phi3.attention.head_count_kv: 32
phi3.attention.layer_norm_rms_epsilon: 0.00001
phi3.attention.sliding_window: 262144
phi3.block_count: 32
phi3.context_length: 131072
phi3.embedding_length: 3072
phi3.feed_forward_length: 8192
phi3.rope.dimension_count: 96
phi3.rope.freq_base: 10000
phi3.rope.scaling.attn_factor: 1.1902381
phi3.rope.scaling.original_context_length: 4096
quantize.imatrix.chunks_count: 151
quantize.imatrix.dataset: /training_dir/calibration_datav3.txt
quantize.imatrix.entries_count: 128
quantize.imatrix.file: /models_out/Phi-3.1-mini-128k-instruct-GGUF/Phi-3.1-mini-128k-instruct.imatrix
[2024-09-15 09:11:01.563] [info] Model loaded.
[2024-09-15 09:11:01.564] [info] Serving on http://localhost:19161.
[2024-09-15 09:11:29.590] [info] completions request parse...{"model":"default","prompt":"who are you","best_of":1,"echo":false,"presence_penalty":null,"frequency_penalty":null,"logit_bias":null,"logprobs":null,"max_tokens":null,"n":1,"stop":null,"stream":null,"temperature":null,"top_p":null,"suffix":null,"user":null,"tools":null,"tool_choice":null,"top_k":null,"grammar":null,"adapters":null,"min_p":null,"dry_multiplier":null,"dry_base":null,"dry_allowed_length":null,"dry_sequence_breakers":null}```

## Latest commit or version
Which commit or version you ran with.
mistralrs = { git = "https://github.com/EricLBuehler/mistral.rs.git", features = ["metal"], version = "0.3.0", rev = "ae71578" }
@sopaco sopaco added the bug Something isn't working label Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant