Replies: 1 comment 1 reply
-
If a layer is not loaded to the GPU, it will still use cuBLAS, only that it needs to copy the data to the device before calculation. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have just have 6GB NVIDIA GPU. So most of the time I will be offloading some of the model layers to GPU.
Does it make sense to compile with both LLAMA_OPENBLAS=1 and LLAMA_CUBLAS=1 enabled?
Will that give any overall performance improvement?
Beta Was this translation helpful? Give feedback.
All reactions