Skip to content

getting SLOW imatrix completion on NVIDIA #9919

Answered by slaren
robbiemu asked this question in Q&A
Discussion options

You must be logged in to vote

The CUDA backend also does not support BF16, so most of the model is running on the CPU. Try a F16 model instead.

Also note that -ngl -1 does not work the way you might expect, no layers will be offloaded that way. Use a large number to offload the entire model instead, eg. -ngl 99.

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@robbiemu
Comment options

@slaren
Comment options

@robbiemu
Comment options

@slaren
Comment options

@robbiemu
Comment options

Answer selected by robbiemu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants