You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Spent a lot of time today running benchmarks today trying to optimize my settings and I'm hoping some other folks might have some suggestions in case I've missed anything.
I'm running:
Current
2019 16in Intel Macbook Pro
32gb of RAM
AMD Radeon Pro 5500M GPU.
Mixtral 8x7b 4-bit quantized model
I'm getting these results generally (2 runs):
llama_print_timings: load time = 153625.16 ms
llama_print_timings: sample time = 3.93 ms / 87 runs ( 0.05 ms per token, 22148.68 tokens per second)
llama_print_timings: prompt eval time = 9871.68 ms / 11 tokens ( 897.43 ms per token, 1.11 tokens per second)
llama_print_timings: eval time = 40815.97 ms / 86 runs ( 474.60 ms per token, 2.11 tokens per second)
llama_print_timings: total time = 50710.67 ms / 97 tokens
ggml_metal_free: deallocating
warning: failed to munlock buffer: Cannot allocate memory
Log end
llama_print_timings: load time = 156684.36 ms
llama_print_timings: sample time = 2.79 ms / 87 runs ( 0.03 ms per token, 31149.30 tokens per second)
llama_print_timings: prompt eval time = 12772.96 ms / 11 tokens ( 1161.18 ms per token, 0.86 tokens per second)
llama_print_timings: eval time = 41914.18 ms / 86 runs ( 487.37 ms per token, 2.05 tokens per second)
llama_print_timings: total time = 54710.37 ms / 97 tokens
ggml_metal_free: deallocating
warning: failed to munlock buffer: Cannot allocate memory
Log end
Settings:
./main \
--ctx_size 2048 \
-t 6 \
-ngl 4 \
--mlock \
--temp 0 \
--prompt "What are the 3 most popular red flowers?" \
-m /Users/geuis/.cache/huggingface/hub/models--TheBloke--Mixtral-8x7B-Instruct-v0.1-GGUF/snapshots/fa1d3835c5d45a3a74c0b68805fcdc133dba2b6a/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf
If I bump -ngl over 4 I see a segmentation fault. It offloads about 3.1gb to the video card at 4.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Spent a lot of time today running benchmarks today trying to optimize my settings and I'm hoping some other folks might have some suggestions in case I've missed anything.
I'm running:
I'm getting these results generally (2 runs):
Settings:
If I bump
-ngl
over 4 I see a segmentation fault. It offloads about 3.1gb to the video card at 4.Beta Was this translation helpful? Give feedback.
All reactions