Cuda error when running gemma2 #715

gqf2008 · 2024-08-27T05:58:12Z

Describe the bug

When running this command RUST_BACKTRACE=full CUDA_LAUNCH_BLOCKING=1 target/release/mistralrs-server -i --isq Q4K -n "1:16;2:16;3:10" --no-paged-attn plain -m google/gemma-2-9b-it -a gemma2, I'm getting this error:

2024-08-27T05:53:27.880023Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-27T05:53:27.880068Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-27T05:53:27.880088Z INFO mistralrs_server: Model kind is: normal (no quant, no adapters)
2024-08-27T05:53:27.880235Z INFO mistralrs_core::pipeline::normal: Loading tokenizer.json at google/gemma-2-9b-it
2024-08-27T05:53:27.880270Z INFO mistralrs_core::pipeline::normal: Loading tokenizer.json locally at google/gemma-2-9b-it/tokenizer.json
2024-08-27T05:53:27.880279Z INFO mistralrs_core::pipeline::normal: Loading config.json at google/gemma-2-9b-it
2024-08-27T05:53:27.880298Z INFO mistralrs_core::pipeline::normal: Loading config.json locally at google/gemma-2-9b-it/config.json
2024-08-27T05:53:27.882701Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00002-of-00004.safetensors", "model-00004-of-00004.safetensors", "model-00001-of-00004.safetensors", "model-00003-of-00004.safetensors"]
2024-08-27T05:53:27.882733Z INFO mistralrs_core::pipeline::paths: Loading model-00002-of-00004.safetensors locally at google/gemma-2-9b-it/model-00002-of-00004.safetensors
2024-08-27T05:53:27.882753Z INFO mistralrs_core::pipeline::paths: Loading model-00004-of-00004.safetensors locally at google/gemma-2-9b-it/model-00004-of-00004.safetensors
2024-08-27T05:53:27.882771Z INFO mistralrs_core::pipeline::paths: Loading model-00001-of-00004.safetensors locally at google/gemma-2-9b-it/model-00001-of-00004.safetensors
2024-08-27T05:53:27.882792Z INFO mistralrs_core::pipeline::paths: Loading model-00003-of-00004.safetensors locally at google/gemma-2-9b-it/model-00003-of-00004.safetensors
2024-08-27T05:53:27.882890Z INFO mistralrs_core::pipeline::normal: Loading generation_config.json at google/gemma-2-9b-it
2024-08-27T05:53:27.882910Z INFO mistralrs_core::pipeline::normal: Loading generation_config.json locally at google/gemma-2-9b-it/generation_config.json
2024-08-27T05:53:27.882998Z INFO mistralrs_core::pipeline::normal: Loading tokenizer_config.json at google/gemma-2-9b-it
2024-08-27T05:53:27.883017Z INFO mistralrs_core::pipeline::normal: Loading tokenizer_config.json locally at google/gemma-2-9b-it/tokenizer_config.json
2024-08-27T05:53:27.945489Z INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 6.1
2024-08-27T05:53:27.945511Z INFO mistralrs_core::utils::normal: Skipping BF16 because CC < 8.0
2024-08-27T05:53:27.967209Z INFO mistralrs_core::utils::normal: DType selected is F16.
2024-08-27T05:53:27.967274Z INFO mistralrs_core::pipeline::normal: Model config: Config { attention_bias: false, head_dim: 256, hidden_act: Some(GeluPytorchTanh), hidden_activation: Some(GeluPytorchTanh), hidden_size: 3584, intermediate_size: 14336, num_attention_heads: 16, num_hidden_layers: 42, num_key_value_heads: 8, rms_norm_eps: 1e-6, rope_theta: 10000.0, vocab_size: 256000, sliding_window: 4096, attn_logit_softcapping: Some(50.0), final_logit_softcapping: Some(30.0), query_pre_attn_scalar: 256, max_position_embeddings: 8192 }
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 105/105 [00:09<00:00, 10.73it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 84/84 [00:11<00:00, 13.74it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 141/141 [00:11<00:00, 2568.93it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 134/134 [00:11<00:00, 11.24it/s]
2024-08-27T05:53:39.957659Z INFO mistralrs_core::device_map: Model has 42 repeating layers.
2024-08-27T05:53:40.313270Z INFO mistralrs_core::device_map: Loading model according to the following repeating layer mappings:
2024-08-27T05:53:40.313292Z INFO mistralrs_core::device_map: Layer 0: cuda[1]
2024-08-27T05:53:40.313298Z INFO mistralrs_core::device_map: Layer 1: cuda[1]
2024-08-27T05:53:40.313304Z INFO mistralrs_core::device_map: Layer 2: cuda[1]
2024-08-27T05:53:40.313310Z INFO mistralrs_core::device_map: Layer 3: cuda[1]
2024-08-27T05:53:40.313317Z INFO mistralrs_core::device_map: Layer 4: cuda[1]
2024-08-27T05:53:40.313323Z INFO mistralrs_core::device_map: Layer 5: cuda[1]
2024-08-27T05:53:40.313329Z INFO mistralrs_core::device_map: Layer 6: cuda[1]
2024-08-27T05:53:40.313337Z INFO mistralrs_core::device_map: Layer 7: cuda[1]
2024-08-27T05:53:40.313343Z INFO mistralrs_core::device_map: Layer 8: cuda[1]
2024-08-27T05:53:40.313349Z INFO mistralrs_core::device_map: Layer 9: cuda[1]
2024-08-27T05:53:40.313355Z INFO mistralrs_core::device_map: Layer 10: cuda[1]
2024-08-27T05:53:40.313361Z INFO mistralrs_core::device_map: Layer 11: cuda[1]
2024-08-27T05:53:40.313367Z INFO mistralrs_core::device_map: Layer 12: cuda[1]
2024-08-27T05:53:40.313373Z INFO mistralrs_core::device_map: Layer 13: cuda[1]
2024-08-27T05:53:40.313379Z INFO mistralrs_core::device_map: Layer 14: cuda[1]
2024-08-27T05:53:40.313385Z INFO mistralrs_core::device_map: Layer 15: cuda[1]
2024-08-27T05:53:40.313391Z INFO mistralrs_core::device_map: Layer 16: cuda[2]
2024-08-27T05:53:40.313396Z INFO mistralrs_core::device_map: Layer 17: cuda[2]
2024-08-27T05:53:40.313402Z INFO mistralrs_core::device_map: Layer 18: cuda[2]
2024-08-27T05:53:40.313408Z INFO mistralrs_core::device_map: Layer 19: cuda[2]
2024-08-27T05:53:40.313414Z INFO mistralrs_core::device_map: Layer 20: cuda[2]
2024-08-27T05:53:40.313420Z INFO mistralrs_core::device_map: Layer 21: cuda[2]
2024-08-27T05:53:40.313426Z INFO mistralrs_core::device_map: Layer 22: cuda[2]
2024-08-27T05:53:40.313432Z INFO mistralrs_core::device_map: Layer 23: cuda[2]
2024-08-27T05:53:40.313438Z INFO mistralrs_core::device_map: Layer 24: cuda[2]
2024-08-27T05:53:40.313443Z INFO mistralrs_core::device_map: Layer 25: cuda[2]
2024-08-27T05:53:40.313449Z INFO mistralrs_core::device_map: Layer 26: cuda[2]
2024-08-27T05:53:40.313455Z INFO mistralrs_core::device_map: Layer 27: cuda[2]
2024-08-27T05:53:40.313461Z INFO mistralrs_core::device_map: Layer 28: cuda[2]
2024-08-27T05:53:40.313467Z INFO mistralrs_core::device_map: Layer 29: cuda[2]
2024-08-27T05:53:40.313474Z INFO mistralrs_core::device_map: Layer 30: cuda[2]
2024-08-27T05:53:40.313480Z INFO mistralrs_core::device_map: Layer 31: cuda[2]
2024-08-27T05:53:40.313486Z INFO mistralrs_core::device_map: Layer 32: cuda[3]
2024-08-27T05:53:40.313491Z INFO mistralrs_core::device_map: Layer 33: cuda[3]
2024-08-27T05:53:40.313497Z INFO mistralrs_core::device_map: Layer 34: cuda[3]
2024-08-27T05:53:40.313503Z INFO mistralrs_core::device_map: Layer 35: cuda[3]
2024-08-27T05:53:40.313509Z INFO mistralrs_core::device_map: Layer 36: cuda[3]
2024-08-27T05:53:40.313516Z INFO mistralrs_core::device_map: Layer 37: cuda[3]
2024-08-27T05:53:40.313522Z INFO mistralrs_core::device_map: Layer 38: cuda[3]
2024-08-27T05:53:40.313528Z INFO mistralrs_core::device_map: Layer 39: cuda[3]
2024-08-27T05:53:40.313533Z INFO mistralrs_core::device_map: Layer 40: cuda[3]
2024-08-27T05:53:40.313539Z INFO mistralrs_core::device_map: Layer 41: cuda[3]
2024-08-27T05:53:40.372062Z INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 6.1
2024-08-27T05:53:40.372073Z INFO mistralrs_core::utils::normal: Skipping BF16 because CC < 8.0
2024-08-27T05:53:40.377781Z INFO mistralrs_core::utils::normal: DType selected is F16.
2024-08-27T05:53:45.119030Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization into Q4K to 295 tensors.
2024-08-27T05:53:45.124278Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 80 threads.
2024-08-27T05:54:33.977956Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization into Q4K to 295 tensors out of 295 total tensors. Took 48.86s
2024-08-27T05:54:33.978167Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization bias device mapping to 294 biases.
2024-08-27T05:54:33.978461Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 80 threads.
2024-08-27T05:54:33.980828Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization device mapping. Took 0.00s
2024-08-27T05:54:34.841869Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "", eos_toks = "", "<end_of_turn>", unk_tok =
2024-08-27T05:54:34.912589Z INFO mistralrs_server: Model loaded.
2024-08-27T05:54:34.918294Z INFO mistralrs_core: Enabling GEMM reduced precision in BF16.
2024-08-27T05:54:34.927413Z INFO mistralrs_core: Enabling GEMM reduced precision in F16.
2024-08-27T05:54:34.928631Z INFO mistralrs_core::cublaslt: Initialized cuBLASlt handle
2024-08-27T05:54:34.928750Z INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), min_p: Some(0.05), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 }

hello
thread '' panicked at /home/wosai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called Result::unwrap() on an Err value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
0: 0x561eb8a72738 - <std::sys::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h72ae2693fe3679c0
1: 0x561eb7f3a96b - core::fmt::write::heb2112eefb3480d2
2: 0x561eb8a3512e - std::io::Write::write_fmt::hb21f507e35bcfafb
3: 0x561eb8a74629 - std::sys::backtrace::print::h8c7c13a068389915
4: 0x561eb8a73959 - std::panicking::default_hook::{{closure}}::h165d0776fc5b2025
5: 0x561eb8a750e5 - std::panicking::rust_panic_with_hook::h56be292a19683b4c
6: 0x561eb8a74a15 - std::panicking::begin_panic_handler::{{closure}}::h3deeb56cba176ab2
7: 0x561eb8a74979 - std::sys::backtrace::_rust_end_short_backtrace::hc30be025c447bdc1
8: 0x561eb8a74963 - rust_begin_unwind
9: 0x561eb7f38b51 - core::panicking::panic_fmt::h419642f979996b15
10: 0x561eb7f41185 - core::result::unwrap_failed::h0529da0e6cd0d54f
11: 0x561eb825f509 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice>::h079552940b2be28e
12: 0x561eb825ec41 - alloc::sync::Arc<T,A>::drop_slow::hc72120ce55330bbd
13: 0x561eb825eb70 - core::ptr::drop_in_place<candle_core::tensor::Tensor>::h219980205baf0911
14: 0x561eb825c8ad - alloc::sync::Arc<T,A>::drop_slow::h85386a33c23e275d
15: 0x561eb8693334 - mistralrs_core::models::gemma2::Model::forward::h589ae9cc57b30a4d
16: 0x561eb869198d - <mistralrs_core::models::gemma2::Model as mistralrs_core::pipeline::NormalModel>::forward::hf8049a3b5a676b8c
17: 0x561eb82c42d5 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h9c3c7b255d6700fb
18: 0x561eb82f5494 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::hba58536883e3c927
19: 0x561eb8788fca - mistralrs_core::engine::Engine::run::{{closure}}::h2fc8bac59768ce2d
20: 0x561eb87840ea - std::sys::backtrace::__rust_begin_short_backtrace::h86ca94e37e66cb93
21: 0x561eb878338a - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdc48886c0e3332a8
22: 0x561eb8a772fb - std::sys::pal::unix::thread::Thread::new::thread_start::h4aa16783dfb29b35
23: 0x7f75b88f4ac3 -
24: 0x7f75b8986850 -
25: 0x0 -
thread '' panicked at /home/wosai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called Result::unwrap() on an Err value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
0: 0x561eb8a72738 - <std::sys::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h72ae2693fe3679c0
1: 0x561eb7f3a96b - core::fmt::write::heb2112eefb3480d2
2: 0x561eb8a3512e - std::io::Write::write_fmt::hb21f507e35bcfafb
3: 0x561eb8a74629 - std::sys::backtrace::print::h8c7c13a068389915
4: 0x561eb8a73959 - std::panicking::default_hook::{{closure}}::h165d0776fc5b2025
5: 0x561eb8a750e5 - std::panicking::rust_panic_with_hook::h56be292a19683b4c
6: 0x561eb8a74a15 - std::panicking::begin_panic_handler::{{closure}}::h3deeb56cba176ab2
7: 0x561eb8a74979 - std::sys::backtrace::_rust_end_short_backtrace::hc30be025c447bdc1
8: 0x561eb8a74963 - rust_begin_unwind
9: 0x561eb7f38b51 - core::panicking::panic_fmt::h419642f979996b15
10: 0x561eb7f41185 - core::result::unwrap_failed::h0529da0e6cd0d54f
11: 0x561eb825f509 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice>::h079552940b2be28e
12: 0x561eb825ec41 - alloc::sync::Arc<T,A>::drop_slow::hc72120ce55330bbd
13: 0x561eb825eb70 - core::ptr::drop_in_place<candle_core::tensor::Tensor>::h219980205baf0911
14: 0x561eb825c8ad - alloc::sync::Arc<T,A>::drop_slow::h85386a33c23e275d
15: 0x561eb869547c - mistralrs_core::models::gemma2::Model::forward::h589ae9cc57b30a4d
16: 0x561eb869198d - <mistralrs_core::models::gemma2::Model as mistralrs_core::pipeline::NormalModel>::forward::hf8049a3b5a676b8c
17: 0x561eb82c42d5 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h9c3c7b255d6700fb
18: 0x561eb82f5494 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::hba58536883e3c927
19: 0x561eb8788fca - mistralrs_core::engine::Engine::run::{{closure}}::h2fc8bac59768ce2d
20: 0x561eb87840ea - std::sys::backtrace::__rust_begin_short_backtrace::h86ca94e37e66cb93
21: 0x561eb878338a - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdc48886c0e3332a8
22: 0x561eb8a772fb - std::sys::pal::unix::thread::Thread::new::thread_start::h4aa16783dfb29b35
23: 0x7f75b88f4ac3 -
24: 0x7f75b8986850 -
25: 0x0 -
thread '' panicked at library/core/src/panicking.rs:229:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

gqf2008 added the bug Something isn't working label Aug 27, 2024

wseaton mentioned this issue Sep 3, 2024

DTypeMismatchBinaryOp on add w/ mixtral #728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda error when running gemma2 #715

Cuda error when running gemma2 #715

gqf2008 commented Aug 27, 2024

Cuda error when running gemma2 #715

Cuda error when running gemma2 #715

Comments

gqf2008 commented Aug 27, 2024

Describe the bug