You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Building mistral.rs with the cuda feature, when I test it with mistralrs-bench and a local GGUF I observed via nvidia-smi that layers were allocated to vRAM, but GPU activity was 0 after warmup.
Despite this, within the same environment (llama-cpp official Dockerfile for full-cuda variant), the equivalent llama-cpp bench tool worked using the GPU at 100%. I built both projects within the same container environment myself, so something is off?
I can look at running the Dockerfile from this project, but besides cudnn, there shouldn't be much difference AFAIK. I've not tried other commands, or non-gguf, but assume that shouldn't affect this?
There is a modification I've applied to be able to load the local models without an HF token provided (I don't have an account yet and just wanted to try some projects with models), my workaround was to ignore 401 (unauthorized) similar to how 404 is ignored.
AFAIK this shouldn't affect using the GGUF model negatively? Additional files had to be provided despite this not being required by llama-cpp, from what I understand all the relevant metadata is already available with the GGUF file itself?
The text was updated successfully, but these errors were encountered:
This seems very strange. I'll do some digging, but my suspicion is that they do device mapping differently. Please see my comment in #329.
There is a #326 (comment) to be able to load the local models without an HF token provided (I don't have an account yet and just wanted to try some projects with models), my workaround was to ignore 401 (unauthorized) similar to how 404 is ignored.
AFAIK this shouldn't affect using the GGUF model negatively? Additional files had to be provided despite this not being required by llama-cpp, from what I understand all the relevant metadata is already available with the GGUF file itself?
Although the test finishes rather quickly it's a bit tricky to monitor the load, if you have a command that would take a little longer I could give that a go 👍
EDIT: Advice of increasing -r below, can confirm 100% GPU load.
Describe the bug
Building
mistral.rs
with thecuda
feature, when I test it withmistralrs-bench
and a local GGUF I observed vianvidia-smi
that layers were allocated to vRAM, but GPU activity was 0 after warmup.Despite this, within the same environment (
llama-cpp
officialDockerfile
forfull-cuda
variant), the equivalentllama-cpp
bench tool worked using the GPU at 100%. I built both projects within the same container environment myself, so something is off?More details here: #329 (comment)
I can look at running the
Dockerfile
from this project, but besidescudnn
, there shouldn't be much difference AFAIK. I've not tried other commands, or non-gguf, but assume that shouldn't affect this?Latest commit
v0.1.8: ca9bf7d
Additional context
There is a modification I've applied to be able to load the local models without an HF token provided (I don't have an account yet and just wanted to try some projects with models), my workaround was to ignore 401 (unauthorized) similar to how 404 is ignored.
AFAIK this shouldn't affect using the GGUF model negatively? Additional files had to be provided despite this not being required by
llama-cpp
, from what I understand all the relevant metadata is already available with the GGUF file itself?The text was updated successfully, but these errors were encountered: