We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When running this command RUST_BACKTRACE=full CUDA_LAUNCH_BLOCKING=1 target/release/mistralrs-server -i --isq Q4K -n "1:16;2:16;3:10" --no-paged-attn plain -m google/gemma-2-9b-it -a gemma2, I'm getting this error:
2024-08-27T05:53:27.880023Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true 2024-08-27T05:53:27.880068Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial 2024-08-27T05:53:27.880088Z INFO mistralrs_server: Model kind is: normal (no quant, no adapters) 2024-08-27T05:53:27.880235Z INFO mistralrs_core::pipeline::normal: Loading tokenizer.json at google/gemma-2-9b-it 2024-08-27T05:53:27.880270Z INFO mistralrs_core::pipeline::normal: Loading tokenizer.json locally at google/gemma-2-9b-it/tokenizer.json 2024-08-27T05:53:27.880279Z INFO mistralrs_core::pipeline::normal: Loading config.json at google/gemma-2-9b-it 2024-08-27T05:53:27.880298Z INFO mistralrs_core::pipeline::normal: Loading config.json locally at google/gemma-2-9b-it/config.json 2024-08-27T05:53:27.882701Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00002-of-00004.safetensors", "model-00004-of-00004.safetensors", "model-00001-of-00004.safetensors", "model-00003-of-00004.safetensors"] 2024-08-27T05:53:27.882733Z INFO mistralrs_core::pipeline::paths: Loading model-00002-of-00004.safetensors locally at google/gemma-2-9b-it/model-00002-of-00004.safetensors 2024-08-27T05:53:27.882753Z INFO mistralrs_core::pipeline::paths: Loading model-00004-of-00004.safetensors locally at google/gemma-2-9b-it/model-00004-of-00004.safetensors 2024-08-27T05:53:27.882771Z INFO mistralrs_core::pipeline::paths: Loading model-00001-of-00004.safetensors locally at google/gemma-2-9b-it/model-00001-of-00004.safetensors 2024-08-27T05:53:27.882792Z INFO mistralrs_core::pipeline::paths: Loading model-00003-of-00004.safetensors locally at google/gemma-2-9b-it/model-00003-of-00004.safetensors 2024-08-27T05:53:27.882890Z INFO mistralrs_core::pipeline::normal: Loading generation_config.json at google/gemma-2-9b-it 2024-08-27T05:53:27.882910Z INFO mistralrs_core::pipeline::normal: Loading generation_config.json locally at google/gemma-2-9b-it/generation_config.json 2024-08-27T05:53:27.882998Z INFO mistralrs_core::pipeline::normal: Loading tokenizer_config.json at google/gemma-2-9b-it 2024-08-27T05:53:27.883017Z INFO mistralrs_core::pipeline::normal: Loading tokenizer_config.json locally at google/gemma-2-9b-it/tokenizer_config.json 2024-08-27T05:53:27.945489Z INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 6.1 2024-08-27T05:53:27.945511Z INFO mistralrs_core::utils::normal: Skipping BF16 because CC < 8.0 2024-08-27T05:53:27.967209Z INFO mistralrs_core::utils::normal: DType selected is F16. 2024-08-27T05:53:27.967274Z INFO mistralrs_core::pipeline::normal: Model config: Config { attention_bias: false, head_dim: 256, hidden_act: Some(GeluPytorchTanh), hidden_activation: Some(GeluPytorchTanh), hidden_size: 3584, intermediate_size: 14336, num_attention_heads: 16, num_hidden_layers: 42, num_key_value_heads: 8, rms_norm_eps: 1e-6, rope_theta: 10000.0, vocab_size: 256000, sliding_window: 4096, attn_logit_softcapping: Some(50.0), final_logit_softcapping: Some(30.0), query_pre_attn_scalar: 256, max_position_embeddings: 8192 } 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 105/105 [00:09<00:00, 10.73it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 84/84 [00:11<00:00, 13.74it/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 141/141 [00:11<00:00, 2568.93it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 134/134 [00:11<00:00, 11.24it/s] 2024-08-27T05:53:39.957659Z INFO mistralrs_core::device_map: Model has 42 repeating layers. 2024-08-27T05:53:40.313270Z INFO mistralrs_core::device_map: Loading model according to the following repeating layer mappings: 2024-08-27T05:53:40.313292Z INFO mistralrs_core::device_map: Layer 0: cuda[1] 2024-08-27T05:53:40.313298Z INFO mistralrs_core::device_map: Layer 1: cuda[1] 2024-08-27T05:53:40.313304Z INFO mistralrs_core::device_map: Layer 2: cuda[1] 2024-08-27T05:53:40.313310Z INFO mistralrs_core::device_map: Layer 3: cuda[1] 2024-08-27T05:53:40.313317Z INFO mistralrs_core::device_map: Layer 4: cuda[1] 2024-08-27T05:53:40.313323Z INFO mistralrs_core::device_map: Layer 5: cuda[1] 2024-08-27T05:53:40.313329Z INFO mistralrs_core::device_map: Layer 6: cuda[1] 2024-08-27T05:53:40.313337Z INFO mistralrs_core::device_map: Layer 7: cuda[1] 2024-08-27T05:53:40.313343Z INFO mistralrs_core::device_map: Layer 8: cuda[1] 2024-08-27T05:53:40.313349Z INFO mistralrs_core::device_map: Layer 9: cuda[1] 2024-08-27T05:53:40.313355Z INFO mistralrs_core::device_map: Layer 10: cuda[1] 2024-08-27T05:53:40.313361Z INFO mistralrs_core::device_map: Layer 11: cuda[1] 2024-08-27T05:53:40.313367Z INFO mistralrs_core::device_map: Layer 12: cuda[1] 2024-08-27T05:53:40.313373Z INFO mistralrs_core::device_map: Layer 13: cuda[1] 2024-08-27T05:53:40.313379Z INFO mistralrs_core::device_map: Layer 14: cuda[1] 2024-08-27T05:53:40.313385Z INFO mistralrs_core::device_map: Layer 15: cuda[1] 2024-08-27T05:53:40.313391Z INFO mistralrs_core::device_map: Layer 16: cuda[2] 2024-08-27T05:53:40.313396Z INFO mistralrs_core::device_map: Layer 17: cuda[2] 2024-08-27T05:53:40.313402Z INFO mistralrs_core::device_map: Layer 18: cuda[2] 2024-08-27T05:53:40.313408Z INFO mistralrs_core::device_map: Layer 19: cuda[2] 2024-08-27T05:53:40.313414Z INFO mistralrs_core::device_map: Layer 20: cuda[2] 2024-08-27T05:53:40.313420Z INFO mistralrs_core::device_map: Layer 21: cuda[2] 2024-08-27T05:53:40.313426Z INFO mistralrs_core::device_map: Layer 22: cuda[2] 2024-08-27T05:53:40.313432Z INFO mistralrs_core::device_map: Layer 23: cuda[2] 2024-08-27T05:53:40.313438Z INFO mistralrs_core::device_map: Layer 24: cuda[2] 2024-08-27T05:53:40.313443Z INFO mistralrs_core::device_map: Layer 25: cuda[2] 2024-08-27T05:53:40.313449Z INFO mistralrs_core::device_map: Layer 26: cuda[2] 2024-08-27T05:53:40.313455Z INFO mistralrs_core::device_map: Layer 27: cuda[2] 2024-08-27T05:53:40.313461Z INFO mistralrs_core::device_map: Layer 28: cuda[2] 2024-08-27T05:53:40.313467Z INFO mistralrs_core::device_map: Layer 29: cuda[2] 2024-08-27T05:53:40.313474Z INFO mistralrs_core::device_map: Layer 30: cuda[2] 2024-08-27T05:53:40.313480Z INFO mistralrs_core::device_map: Layer 31: cuda[2] 2024-08-27T05:53:40.313486Z INFO mistralrs_core::device_map: Layer 32: cuda[3] 2024-08-27T05:53:40.313491Z INFO mistralrs_core::device_map: Layer 33: cuda[3] 2024-08-27T05:53:40.313497Z INFO mistralrs_core::device_map: Layer 34: cuda[3] 2024-08-27T05:53:40.313503Z INFO mistralrs_core::device_map: Layer 35: cuda[3] 2024-08-27T05:53:40.313509Z INFO mistralrs_core::device_map: Layer 36: cuda[3] 2024-08-27T05:53:40.313516Z INFO mistralrs_core::device_map: Layer 37: cuda[3] 2024-08-27T05:53:40.313522Z INFO mistralrs_core::device_map: Layer 38: cuda[3] 2024-08-27T05:53:40.313528Z INFO mistralrs_core::device_map: Layer 39: cuda[3] 2024-08-27T05:53:40.313533Z INFO mistralrs_core::device_map: Layer 40: cuda[3] 2024-08-27T05:53:40.313539Z INFO mistralrs_core::device_map: Layer 41: cuda[3] 2024-08-27T05:53:40.372062Z INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 6.1 2024-08-27T05:53:40.372073Z INFO mistralrs_core::utils::normal: Skipping BF16 because CC < 8.0 2024-08-27T05:53:40.377781Z INFO mistralrs_core::utils::normal: DType selected is F16. 2024-08-27T05:53:45.119030Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization into Q4K to 295 tensors. 2024-08-27T05:53:45.124278Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 80 threads. 2024-08-27T05:54:33.977956Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization into Q4K to 295 tensors out of 295 total tensors. Took 48.86s 2024-08-27T05:54:33.978167Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization bias device mapping to 294 biases. 2024-08-27T05:54:33.978461Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 80 threads. 2024-08-27T05:54:33.980828Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization device mapping. Took 0.00s 2024-08-27T05:54:34.841869Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "", eos_toks = "", "<end_of_turn>", unk_tok = 2024-08-27T05:54:34.912589Z INFO mistralrs_server: Model loaded. 2024-08-27T05:54:34.918294Z INFO mistralrs_core: Enabling GEMM reduced precision in BF16. 2024-08-27T05:54:34.927413Z INFO mistralrs_core: Enabling GEMM reduced precision in F16. 2024-08-27T05:54:34.928631Z INFO mistralrs_core::cublaslt: Initialized cuBLASlt handle 2024-08-27T05:54:34.928750Z INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), min_p: Some(0.05), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 }
tokenizer.json
google/gemma-2-9b-it
google/gemma-2-9b-it/tokenizer.json
config.json
google/gemma-2-9b-it/config.json
model-00002-of-00004.safetensors
google/gemma-2-9b-it/model-00002-of-00004.safetensors
model-00004-of-00004.safetensors
google/gemma-2-9b-it/model-00004-of-00004.safetensors
model-00001-of-00004.safetensors
google/gemma-2-9b-it/model-00001-of-00004.safetensors
model-00003-of-00004.safetensors
google/gemma-2-9b-it/model-00003-of-00004.safetensors
generation_config.json
google/gemma-2-9b-it/generation_config.json
tokenizer_config.json
google/gemma-2-9b-it/tokenizer_config.json
hello thread '' panicked at /home/wosai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76: called Result::unwrap() on an Err value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered") stack backtrace: 0: 0x561eb8a72738 - <std::sys::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h72ae2693fe3679c0 1: 0x561eb7f3a96b - core::fmt::write::heb2112eefb3480d2 2: 0x561eb8a3512e - std::io::Write::write_fmt::hb21f507e35bcfafb 3: 0x561eb8a74629 - std::sys::backtrace::print::h8c7c13a068389915 4: 0x561eb8a73959 - std::panicking::default_hook::{{closure}}::h165d0776fc5b2025 5: 0x561eb8a750e5 - std::panicking::rust_panic_with_hook::h56be292a19683b4c 6: 0x561eb8a74a15 - std::panicking::begin_panic_handler::{{closure}}::h3deeb56cba176ab2 7: 0x561eb8a74979 - std::sys::backtrace::_rust_end_short_backtrace::hc30be025c447bdc1 8: 0x561eb8a74963 - rust_begin_unwind 9: 0x561eb7f38b51 - core::panicking::panic_fmt::h419642f979996b15 10: 0x561eb7f41185 - core::result::unwrap_failed::h0529da0e6cd0d54f 11: 0x561eb825f509 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice>::h079552940b2be28e 12: 0x561eb825ec41 - alloc::sync::Arc<T,A>::drop_slow::hc72120ce55330bbd 13: 0x561eb825eb70 - core::ptr::drop_in_place<candle_core::tensor::Tensor>::h219980205baf0911 14: 0x561eb825c8ad - alloc::sync::Arc<T,A>::drop_slow::h85386a33c23e275d 15: 0x561eb8693334 - mistralrs_core::models::gemma2::Model::forward::h589ae9cc57b30a4d 16: 0x561eb869198d - <mistralrs_core::models::gemma2::Model as mistralrs_core::pipeline::NormalModel>::forward::hf8049a3b5a676b8c 17: 0x561eb82c42d5 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h9c3c7b255d6700fb 18: 0x561eb82f5494 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::hba58536883e3c927 19: 0x561eb8788fca - mistralrs_core::engine::Engine::run::{{closure}}::h2fc8bac59768ce2d 20: 0x561eb87840ea - std::sys::backtrace::__rust_begin_short_backtrace::h86ca94e37e66cb93 21: 0x561eb878338a - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdc48886c0e3332a8 22: 0x561eb8a772fb - std::sys::pal::unix::thread::Thread::new::thread_start::h4aa16783dfb29b35 23: 0x7f75b88f4ac3 - 24: 0x7f75b8986850 - 25: 0x0 - thread '' panicked at /home/wosai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76: called Result::unwrap() on an Err value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered") stack backtrace: 0: 0x561eb8a72738 - <std::sys::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h72ae2693fe3679c0 1: 0x561eb7f3a96b - core::fmt::write::heb2112eefb3480d2 2: 0x561eb8a3512e - std::io::Write::write_fmt::hb21f507e35bcfafb 3: 0x561eb8a74629 - std::sys::backtrace::print::h8c7c13a068389915 4: 0x561eb8a73959 - std::panicking::default_hook::{{closure}}::h165d0776fc5b2025 5: 0x561eb8a750e5 - std::panicking::rust_panic_with_hook::h56be292a19683b4c 6: 0x561eb8a74a15 - std::panicking::begin_panic_handler::{{closure}}::h3deeb56cba176ab2 7: 0x561eb8a74979 - std::sys::backtrace::_rust_end_short_backtrace::hc30be025c447bdc1 8: 0x561eb8a74963 - rust_begin_unwind 9: 0x561eb7f38b51 - core::panicking::panic_fmt::h419642f979996b15 10: 0x561eb7f41185 - core::result::unwrap_failed::h0529da0e6cd0d54f 11: 0x561eb825f509 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice>::h079552940b2be28e 12: 0x561eb825ec41 - alloc::sync::Arc<T,A>::drop_slow::hc72120ce55330bbd 13: 0x561eb825eb70 - core::ptr::drop_in_place<candle_core::tensor::Tensor>::h219980205baf0911 14: 0x561eb825c8ad - alloc::sync::Arc<T,A>::drop_slow::h85386a33c23e275d 15: 0x561eb869547c - mistralrs_core::models::gemma2::Model::forward::h589ae9cc57b30a4d 16: 0x561eb869198d - <mistralrs_core::models::gemma2::Model as mistralrs_core::pipeline::NormalModel>::forward::hf8049a3b5a676b8c 17: 0x561eb82c42d5 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h9c3c7b255d6700fb 18: 0x561eb82f5494 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::hba58536883e3c927 19: 0x561eb8788fca - mistralrs_core::engine::Engine::run::{{closure}}::h2fc8bac59768ce2d 20: 0x561eb87840ea - std::sys::backtrace::__rust_begin_short_backtrace::h86ca94e37e66cb93 21: 0x561eb878338a - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdc48886c0e3332a8 22: 0x561eb8a772fb - std::sys::pal::unix::thread::Thread::new::thread_start::h4aa16783dfb29b35 23: 0x7f75b88f4ac3 - 24: 0x7f75b8986850 - 25: 0x0 - thread '' panicked at library/core/src/panicking.rs:229:5: panic in a destructor during cleanup thread caused non-unwinding panic. aborting. Aborted (core dumped)
Result::unwrap()
Err
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Describe the bug
When running this command RUST_BACKTRACE=full CUDA_LAUNCH_BLOCKING=1 target/release/mistralrs-server -i --isq Q4K -n "1:16;2:16;3:10" --no-paged-attn plain -m google/gemma-2-9b-it -a gemma2, I'm getting this error:
2024-08-27T05:53:27.880023Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-27T05:53:27.880068Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-27T05:53:27.880088Z INFO mistralrs_server: Model kind is: normal (no quant, no adapters)
2024-08-27T05:53:27.880235Z INFO mistralrs_core::pipeline::normal: Loading
tokenizer.json
atgoogle/gemma-2-9b-it
2024-08-27T05:53:27.880270Z INFO mistralrs_core::pipeline::normal: Loading
tokenizer.json
locally atgoogle/gemma-2-9b-it/tokenizer.json
2024-08-27T05:53:27.880279Z INFO mistralrs_core::pipeline::normal: Loading
config.json
atgoogle/gemma-2-9b-it
2024-08-27T05:53:27.880298Z INFO mistralrs_core::pipeline::normal: Loading
config.json
locally atgoogle/gemma-2-9b-it/config.json
2024-08-27T05:53:27.882701Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00002-of-00004.safetensors", "model-00004-of-00004.safetensors", "model-00001-of-00004.safetensors", "model-00003-of-00004.safetensors"]
2024-08-27T05:53:27.882733Z INFO mistralrs_core::pipeline::paths: Loading
model-00002-of-00004.safetensors
locally atgoogle/gemma-2-9b-it/model-00002-of-00004.safetensors
2024-08-27T05:53:27.882753Z INFO mistralrs_core::pipeline::paths: Loading
model-00004-of-00004.safetensors
locally atgoogle/gemma-2-9b-it/model-00004-of-00004.safetensors
2024-08-27T05:53:27.882771Z INFO mistralrs_core::pipeline::paths: Loading
model-00001-of-00004.safetensors
locally atgoogle/gemma-2-9b-it/model-00001-of-00004.safetensors
2024-08-27T05:53:27.882792Z INFO mistralrs_core::pipeline::paths: Loading
model-00003-of-00004.safetensors
locally atgoogle/gemma-2-9b-it/model-00003-of-00004.safetensors
2024-08-27T05:53:27.882890Z INFO mistralrs_core::pipeline::normal: Loading
generation_config.json
atgoogle/gemma-2-9b-it
2024-08-27T05:53:27.882910Z INFO mistralrs_core::pipeline::normal: Loading
generation_config.json
locally atgoogle/gemma-2-9b-it/generation_config.json
2024-08-27T05:53:27.882998Z INFO mistralrs_core::pipeline::normal: Loading
tokenizer_config.json
atgoogle/gemma-2-9b-it
2024-08-27T05:53:27.883017Z INFO mistralrs_core::pipeline::normal: Loading
tokenizer_config.json
locally atgoogle/gemma-2-9b-it/tokenizer_config.json
2024-08-27T05:53:27.945489Z INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 6.1
2024-08-27T05:53:27.945511Z INFO mistralrs_core::utils::normal: Skipping BF16 because CC < 8.0
2024-08-27T05:53:27.967209Z INFO mistralrs_core::utils::normal: DType selected is F16.
2024-08-27T05:53:27.967274Z INFO mistralrs_core::pipeline::normal: Model config: Config { attention_bias: false, head_dim: 256, hidden_act: Some(GeluPytorchTanh), hidden_activation: Some(GeluPytorchTanh), hidden_size: 3584, intermediate_size: 14336, num_attention_heads: 16, num_hidden_layers: 42, num_key_value_heads: 8, rms_norm_eps: 1e-6, rope_theta: 10000.0, vocab_size: 256000, sliding_window: 4096, attn_logit_softcapping: Some(50.0), final_logit_softcapping: Some(30.0), query_pre_attn_scalar: 256, max_position_embeddings: 8192 }
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 105/105 [00:09<00:00, 10.73it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 84/84 [00:11<00:00, 13.74it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 141/141 [00:11<00:00, 2568.93it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 134/134 [00:11<00:00, 11.24it/s]
2024-08-27T05:53:39.957659Z INFO mistralrs_core::device_map: Model has 42 repeating layers.
2024-08-27T05:53:40.313270Z INFO mistralrs_core::device_map: Loading model according to the following repeating layer mappings:
2024-08-27T05:53:40.313292Z INFO mistralrs_core::device_map: Layer 0: cuda[1]
2024-08-27T05:53:40.313298Z INFO mistralrs_core::device_map: Layer 1: cuda[1]
2024-08-27T05:53:40.313304Z INFO mistralrs_core::device_map: Layer 2: cuda[1]
2024-08-27T05:53:40.313310Z INFO mistralrs_core::device_map: Layer 3: cuda[1]
2024-08-27T05:53:40.313317Z INFO mistralrs_core::device_map: Layer 4: cuda[1]
2024-08-27T05:53:40.313323Z INFO mistralrs_core::device_map: Layer 5: cuda[1]
2024-08-27T05:53:40.313329Z INFO mistralrs_core::device_map: Layer 6: cuda[1]
2024-08-27T05:53:40.313337Z INFO mistralrs_core::device_map: Layer 7: cuda[1]
2024-08-27T05:53:40.313343Z INFO mistralrs_core::device_map: Layer 8: cuda[1]
2024-08-27T05:53:40.313349Z INFO mistralrs_core::device_map: Layer 9: cuda[1]
2024-08-27T05:53:40.313355Z INFO mistralrs_core::device_map: Layer 10: cuda[1]
2024-08-27T05:53:40.313361Z INFO mistralrs_core::device_map: Layer 11: cuda[1]
2024-08-27T05:53:40.313367Z INFO mistralrs_core::device_map: Layer 12: cuda[1]
2024-08-27T05:53:40.313373Z INFO mistralrs_core::device_map: Layer 13: cuda[1]
2024-08-27T05:53:40.313379Z INFO mistralrs_core::device_map: Layer 14: cuda[1]
2024-08-27T05:53:40.313385Z INFO mistralrs_core::device_map: Layer 15: cuda[1]
2024-08-27T05:53:40.313391Z INFO mistralrs_core::device_map: Layer 16: cuda[2]
2024-08-27T05:53:40.313396Z INFO mistralrs_core::device_map: Layer 17: cuda[2]
2024-08-27T05:53:40.313402Z INFO mistralrs_core::device_map: Layer 18: cuda[2]
2024-08-27T05:53:40.313408Z INFO mistralrs_core::device_map: Layer 19: cuda[2]
2024-08-27T05:53:40.313414Z INFO mistralrs_core::device_map: Layer 20: cuda[2]
2024-08-27T05:53:40.313420Z INFO mistralrs_core::device_map: Layer 21: cuda[2]
2024-08-27T05:53:40.313426Z INFO mistralrs_core::device_map: Layer 22: cuda[2]
2024-08-27T05:53:40.313432Z INFO mistralrs_core::device_map: Layer 23: cuda[2]
2024-08-27T05:53:40.313438Z INFO mistralrs_core::device_map: Layer 24: cuda[2]
2024-08-27T05:53:40.313443Z INFO mistralrs_core::device_map: Layer 25: cuda[2]
2024-08-27T05:53:40.313449Z INFO mistralrs_core::device_map: Layer 26: cuda[2]
2024-08-27T05:53:40.313455Z INFO mistralrs_core::device_map: Layer 27: cuda[2]
2024-08-27T05:53:40.313461Z INFO mistralrs_core::device_map: Layer 28: cuda[2]
2024-08-27T05:53:40.313467Z INFO mistralrs_core::device_map: Layer 29: cuda[2]
2024-08-27T05:53:40.313474Z INFO mistralrs_core::device_map: Layer 30: cuda[2]
2024-08-27T05:53:40.313480Z INFO mistralrs_core::device_map: Layer 31: cuda[2]
2024-08-27T05:53:40.313486Z INFO mistralrs_core::device_map: Layer 32: cuda[3]
2024-08-27T05:53:40.313491Z INFO mistralrs_core::device_map: Layer 33: cuda[3]
2024-08-27T05:53:40.313497Z INFO mistralrs_core::device_map: Layer 34: cuda[3]
2024-08-27T05:53:40.313503Z INFO mistralrs_core::device_map: Layer 35: cuda[3]
2024-08-27T05:53:40.313509Z INFO mistralrs_core::device_map: Layer 36: cuda[3]
2024-08-27T05:53:40.313516Z INFO mistralrs_core::device_map: Layer 37: cuda[3]
2024-08-27T05:53:40.313522Z INFO mistralrs_core::device_map: Layer 38: cuda[3]
2024-08-27T05:53:40.313528Z INFO mistralrs_core::device_map: Layer 39: cuda[3]
2024-08-27T05:53:40.313533Z INFO mistralrs_core::device_map: Layer 40: cuda[3]
2024-08-27T05:53:40.313539Z INFO mistralrs_core::device_map: Layer 41: cuda[3]
2024-08-27T05:53:40.372062Z INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 6.1
2024-08-27T05:53:40.372073Z INFO mistralrs_core::utils::normal: Skipping BF16 because CC < 8.0
2024-08-27T05:53:40.377781Z INFO mistralrs_core::utils::normal: DType selected is F16.
2024-08-27T05:53:45.119030Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization into Q4K to 295 tensors.
2024-08-27T05:53:45.124278Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 80 threads.
2024-08-27T05:54:33.977956Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization into Q4K to 295 tensors out of 295 total tensors. Took 48.86s
2024-08-27T05:54:33.978167Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization bias device mapping to 294 biases.
2024-08-27T05:54:33.978461Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 80 threads.
2024-08-27T05:54:33.980828Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization device mapping. Took 0.00s
2024-08-27T05:54:34.841869Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "", eos_toks = "", "<end_of_turn>", unk_tok =
2024-08-27T05:54:34.912589Z INFO mistralrs_server: Model loaded.
2024-08-27T05:54:34.918294Z INFO mistralrs_core: Enabling GEMM reduced precision in BF16.
2024-08-27T05:54:34.927413Z INFO mistralrs_core: Enabling GEMM reduced precision in F16.
2024-08-27T05:54:34.928631Z INFO mistralrs_core::cublaslt: Initialized cuBLASlt handle
2024-08-27T05:54:34.928750Z INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), min_p: Some(0.05), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 }
The text was updated successfully, but these errors were encountered: