Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex. #774

sopaco · 2024-09-15T01:19:02Z

Describe the bug

Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex.
Even to the point of freezing, the GPU usage is high but it cannot provide completions results.
the model：https://huggingface.co/cleatherbury/Phi-3-mini-128k-instruct-Q4_K_M-GGUF

[2024-09-15 09:11:00.088] [info] Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
[2024-09-15 09:11:00.088] [info] Model kind is: quantized from gguf (no adapters)
general.architecture: phi3
general.basename: Phi-3
general.file_type: 7
general.finetune: 128k-instruct
general.languages: en
general.license: mit
general.license.link: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE
general.name: Phi 3 Mini 128k Instruct
general.quantization_version: 2
general.size_label: mini
general.tags: nlp, code, text-generation
general.type: model
phi3.attention.head_count: 32
phi3.attention.head_count_kv: 32
phi3.attention.layer_norm_rms_epsilon: 0.00001
phi3.attention.sliding_window: 262144
phi3.block_count: 32
phi3.context_length: 131072
phi3.embedding_length: 3072
phi3.feed_forward_length: 8192
phi3.rope.dimension_count: 96
phi3.rope.freq_base: 10000
phi3.rope.scaling.attn_factor: 1.1902381
phi3.rope.scaling.original_context_length: 4096
quantize.imatrix.chunks_count: 151
quantize.imatrix.dataset: /training_dir/calibration_datav3.txt
quantize.imatrix.entries_count: 128
quantize.imatrix.file: /models_out/Phi-3.1-mini-128k-instruct-GGUF/Phi-3.1-mini-128k-instruct.imatrix
[2024-09-15 09:11:01.563] [info] Model loaded.
[2024-09-15 09:11:01.564] [info] Serving on http://localhost:19161.
[2024-09-15 09:11:29.590] [info] completions request parse...{"model":"default","prompt":"who are you","best_of":1,"echo":false,"presence_penalty":null,"frequency_penalty":null,"logit_bias":null,"logprobs":null,"max_tokens":null,"n":1,"stop":null,"stream":null,"temperature":null,"top_p":null,"suffix":null,"user":null,"tools":null,"tool_choice":null,"top_k":null,"grammar":null,"adapters":null,"min_p":null,"dry_multiplier":null,"dry_base":null,"dry_allowed_length":null,"dry_sequence_breakers":null}```

## Latest commit or version
Which commit or version you ran with.
mistralrs = { git = "https://github.com/EricLBuehler/mistral.rs.git", features = ["metal"], version = "0.3.0", rev = "ae71578" }

sopaco added the bug Something isn't working label Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex. #774

Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex. #774

sopaco commented Sep 15, 2024 •

edited

Loading

Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex. #774

Running in the MacBook M2 Pro Metal mode is too slow, and it becomes incredibly slow when the issue is slightly more complex. #774

Comments

sopaco commented Sep 15, 2024 • edited Loading

Describe the bug

sopaco commented Sep 15, 2024 •

edited

Loading