You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As you can see I'm trying to set everything to not use GPU, however it still used very actively and CUDA fails with OOM.
I'm not providing full logs as a plain text as they are completely broken when the CLI displays a "loader" in such setup (see below just for reference), I doubt they are comprehensible.
Broken log output
Here's the output before it gets corruped, unfortunately likely nothing useful there.
Head of the OOM log
harbor.mistralrs | 2024-09-04T13:38:41.252088Z INFO mistralrs_server: avx: false, neon: false, simd128: false, f16c: false
harbor.mistralrs | 2024-09-04T13:38:41.252103Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
harbor.mistralrs | 2024-09-04T13:38:41.252109Z INFO mistralrs_server: Model kind is: normal (no quant, no adapters)
harbor.mistralrs | 2024-09-04T13:38:41.252123Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token"
harbor.mistralrs | 2024-09-04T13:38:41.252164Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer.json` at `microsoft/Phi-3.5-MoE-instruct`
harbor.mistralrs | 2024-09-04T13:38:41.252188Z INFO mistralrs_core::pipeline::normal: Loading `config.json` at `microsoft/Phi-3.5-MoE-instruct`
harbor.mistralrs | 2024-09-04T13:38:41.499832Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00001-of-00017.safetensors", "model-00002-of-00017.safetensors", "model-00003-of-00017.safetensors", "model-00004-of-00017.safetensors", "model-00005-of-00017.safetensors", "model-00006-of-00017.safetensors", "model-00007-of-00017.safetensors", "model-00008-of-00017.safetensors", "model-00009-of-00017.safetensors", "model-00010-of-00017.safetensors", "model-00011-of-00017.safetensors", "model-00012-of-00017.safetensors", "model-00013-of-00017.safetensors", "model-00014-of-00017.safetensors", "model-00015-of-00017.safetensors", "model-00016-of-00017.safetensors", "model-00017-of-00017.safetensors"]
harbor.mistralrs | 2024-09-04T13:38:41.640566Z INFO mistralrs_core::pipeline::normal: Loading `generation_config.json` at `microsoft/Phi-3.5-MoE-instruct`
harbor.mistralrs | 2024-09-04T13:38:41.933136Z INFO mistralrs_core::pipeline::normal: Loading `tokenizer_config.json` at `microsoft/Phi-3.5-MoE-instruct`
harbor.mistralrs | 2024-09-04T13:38:41.933252Z INFO mistralrs_core::device_map: Model has 32 repeating layers.
harbor.mistralrs | 2024-09-04T13:38:41.933255Z INFO mistralrs_core::device_map: Loading model according to the following repeating layer mappings:
harbor.mistralrs | 2024-09-04T13:38:41.933259Z INFO mistralrs_core::device_map: Layer 0: cuda[0]
harbor.mistralrs | 2024-09-04T13:38:41.933260Z INFO mistralrs_core::device_map: Layer 1: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933261Z INFO mistralrs_core::device_map: Layer 2: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933261Z INFO mistralrs_core::device_map: Layer 3: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933262Z INFO mistralrs_core::device_map: Layer 4: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933263Z INFO mistralrs_core::device_map: Layer 5: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933263Z INFO mistralrs_core::device_map: Layer 6: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933264Z INFO mistralrs_core::device_map: Layer 7: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933265Z INFO mistralrs_core::device_map: Layer 8: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933265Z INFO mistralrs_core::device_map: Layer 9: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933266Z INFO mistralrs_core::device_map: Layer 10: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933267Z INFO mistralrs_core::device_map: Layer 11: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933267Z INFO mistralrs_core::device_map: Layer 12: cpu
harbor.mistralrs | 2024-09-04T13:38:41.933268Z INFO mistralrs_core::device_map: Layer 13: cpu
Plucking some output from the corrupted stacktrace (see full on the screenshot for reference):
<candle_core::cuda_backend::CudaStorage as candle_core::backend::BackendStorage>::to_dtype
<mistralrs_core::utils::varbuilder_utils::SafetensorBackend as mistralrs_core::utils::varbuilder_utils::TensorLoaderBackend>::load_name
core::ops::function::FnOnce::call_once{{vtable.shim}}
Latest commit or version
docker image
ghcr.io/ericlbuehler/mistral.rs:cuda-80-0.3
The text was updated successfully, but these errors were encountered:
Describe the bug
I'm trying to run mistralrs on a VRAM-constrained system (16 GB VRAM, 64 GB RAM), via the docker image.
The arguments for the server are:
As you can see I'm trying to set everything to not use GPU, however it still used very actively and CUDA fails with OOM.
I'm not providing full logs as a plain text as they are completely broken when the CLI displays a "loader" in such setup (see below just for reference), I doubt they are comprehensible.
Broken log output
Here's the output before it gets corruped, unfortunately likely nothing useful there.
Head of the OOM log
Plucking some output from the corrupted stacktrace (see full on the screenshot for reference):
Latest commit or version
docker image
The text was updated successfully, but these errors were encountered: