Why my compiled version seems slower than the one download? #9269

BVEsun · 2024-09-02T03:32:45Z

BVEsun
Sep 2, 2024

System: Windows 10
My compiled version using cmake with:
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

and the download one is :
llama-b3651-bin-win-cuda-cu12.2.0-x64

when I run llama-bench, I found the following:
Download One:
llama-bench.exe -m Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -ngl 81
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	81	pp512	5271.29 ± 38.14
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	81	tg128	52.31 ± 0.15

build: 8f1d81a (3651)

My compiled one:
llama-bench.exe -m Meta-Llama-3.1-8B-Instruct-Q8_0.gguf -ngl 81
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Ti, compute capability 8.9, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	81	pp512	5121.40 ± 31.74
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	81	tg128	47.85 ± 0.27

build: a47667c (3650)

I have run several time and my compiled version is slower than download one.

Is there any suggestion for this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why my compiled version seems slower than the one download? #9269

{{title}}

Replies: 0 comments

Select a reply

Why my compiled version seems slower than the one download? #9269

BVEsun Sep 2, 2024

Replies: 0 comments

BVEsun
Sep 2, 2024