You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've tried several LLM utils (like Kobold, Jan, etc) based on llama.cpp with recent versions on my dual 12 cores 24 threads 128Gb SSE4.2 CPU machine with GPUs.
It always crashed when I try to offload to GPUs and it seems everything is compiled the way that you either
can use slow CPU only inference with SSE3 or
can offload to GPU but only if have a AVX* sort of instructions supported.
I can't post it a bug category as logs usually uninformative but sometime state like a poor instruction set like SSE unsupported.
So, my questions are:
Is it right or one can offload to GPU with llama.cpp on a machine with SSE4.1|4.2 support?
What I need to do so? Do I need to recompile llama.cpp with SSE4.1|4.2 support?
Or do I need to add|rewrite some routines/functions in llama.cpp for SSE4.1|4.2 support?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've tried several LLM utils (like Kobold, Jan, etc) based on llama.cpp with recent versions on my dual 12 cores 24 threads 128Gb SSE4.2 CPU machine with GPUs.
It always crashed when I try to offload to GPUs and it seems everything is compiled the way that you either
can use slow CPU only inference with SSE3 or
can offload to GPU but only if have a AVX* sort of instructions supported.
I can't post it a bug category as logs usually uninformative but sometime state like a poor instruction set like SSE unsupported.
So, my questions are:
Beta Was this translation helpful? Give feedback.
All reactions