You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
System: Windows 10
My compiled version using cmake with:
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
and the download one is :
llama-b3651-bin-win-cuda-cu12.2.0-x64
when I run llama-bench, I found the following:
Download One:
llama-bench.exe -m Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -ngl 81
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
System: Windows 10
My compiled version using cmake with:
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
and the download one is :
llama-b3651-bin-win-cuda-cu12.2.0-x64
when I run llama-bench, I found the following:
Download One:
llama-bench.exe -m Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -ngl 81
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes
build: 8f1d81a (3651)
My compiled one:
llama-bench.exe -m Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -ngl 81
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes
build: a47667c (3650)
I have run several time and my compiled version is slower than download one.
Is there any suggestion for this problem?
Beta Was this translation helpful? Give feedback.
All reactions