Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda error when running gemma2 #715

Open
gqf2008 opened this issue Aug 27, 2024 · 0 comments
Open

Cuda error when running gemma2 #715

gqf2008 opened this issue Aug 27, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@gqf2008
Copy link

gqf2008 commented Aug 27, 2024

Describe the bug

When running this command RUST_BACKTRACE=full CUDA_LAUNCH_BLOCKING=1 target/release/mistralrs-server -i --isq Q4K -n "1:16;2:16;3:10" --no-paged-attn plain -m google/gemma-2-9b-it -a gemma2, I'm getting this error:

2024-08-27T05:53:27.880023Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-27T05:53:27.880068Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-27T05:53:27.880088Z INFO mistralrs_server: Model kind is: normal (no quant, no adapters)
2024-08-27T05:53:27.880235Z INFO mistralrs_core::pipeline::normal: Loading tokenizer.json at google/gemma-2-9b-it
2024-08-27T05:53:27.880270Z INFO mistralrs_core::pipeline::normal: Loading tokenizer.json locally at google/gemma-2-9b-it/tokenizer.json
2024-08-27T05:53:27.880279Z INFO mistralrs_core::pipeline::normal: Loading config.json at google/gemma-2-9b-it
2024-08-27T05:53:27.880298Z INFO mistralrs_core::pipeline::normal: Loading config.json locally at google/gemma-2-9b-it/config.json
2024-08-27T05:53:27.882701Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["model-00002-of-00004.safetensors", "model-00004-of-00004.safetensors", "model-00001-of-00004.safetensors", "model-00003-of-00004.safetensors"]
2024-08-27T05:53:27.882733Z INFO mistralrs_core::pipeline::paths: Loading model-00002-of-00004.safetensors locally at google/gemma-2-9b-it/model-00002-of-00004.safetensors
2024-08-27T05:53:27.882753Z INFO mistralrs_core::pipeline::paths: Loading model-00004-of-00004.safetensors locally at google/gemma-2-9b-it/model-00004-of-00004.safetensors
2024-08-27T05:53:27.882771Z INFO mistralrs_core::pipeline::paths: Loading model-00001-of-00004.safetensors locally at google/gemma-2-9b-it/model-00001-of-00004.safetensors
2024-08-27T05:53:27.882792Z INFO mistralrs_core::pipeline::paths: Loading model-00003-of-00004.safetensors locally at google/gemma-2-9b-it/model-00003-of-00004.safetensors
2024-08-27T05:53:27.882890Z INFO mistralrs_core::pipeline::normal: Loading generation_config.json at google/gemma-2-9b-it
2024-08-27T05:53:27.882910Z INFO mistralrs_core::pipeline::normal: Loading generation_config.json locally at google/gemma-2-9b-it/generation_config.json
2024-08-27T05:53:27.882998Z INFO mistralrs_core::pipeline::normal: Loading tokenizer_config.json at google/gemma-2-9b-it
2024-08-27T05:53:27.883017Z INFO mistralrs_core::pipeline::normal: Loading tokenizer_config.json locally at google/gemma-2-9b-it/tokenizer_config.json
2024-08-27T05:53:27.945489Z INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 6.1
2024-08-27T05:53:27.945511Z INFO mistralrs_core::utils::normal: Skipping BF16 because CC < 8.0
2024-08-27T05:53:27.967209Z INFO mistralrs_core::utils::normal: DType selected is F16.
2024-08-27T05:53:27.967274Z INFO mistralrs_core::pipeline::normal: Model config: Config { attention_bias: false, head_dim: 256, hidden_act: Some(GeluPytorchTanh), hidden_activation: Some(GeluPytorchTanh), hidden_size: 3584, intermediate_size: 14336, num_attention_heads: 16, num_hidden_layers: 42, num_key_value_heads: 8, rms_norm_eps: 1e-6, rope_theta: 10000.0, vocab_size: 256000, sliding_window: 4096, attn_logit_softcapping: Some(50.0), final_logit_softcapping: Some(30.0), query_pre_attn_scalar: 256, max_position_embeddings: 8192 }
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 105/105 [00:09<00:00, 10.73it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 84/84 [00:11<00:00, 13.74it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████| 141/141 [00:11<00:00, 2568.93it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 134/134 [00:11<00:00, 11.24it/s]
2024-08-27T05:53:39.957659Z INFO mistralrs_core::device_map: Model has 42 repeating layers.
2024-08-27T05:53:40.313270Z INFO mistralrs_core::device_map: Loading model according to the following repeating layer mappings:
2024-08-27T05:53:40.313292Z INFO mistralrs_core::device_map: Layer 0: cuda[1]
2024-08-27T05:53:40.313298Z INFO mistralrs_core::device_map: Layer 1: cuda[1]
2024-08-27T05:53:40.313304Z INFO mistralrs_core::device_map: Layer 2: cuda[1]
2024-08-27T05:53:40.313310Z INFO mistralrs_core::device_map: Layer 3: cuda[1]
2024-08-27T05:53:40.313317Z INFO mistralrs_core::device_map: Layer 4: cuda[1]
2024-08-27T05:53:40.313323Z INFO mistralrs_core::device_map: Layer 5: cuda[1]
2024-08-27T05:53:40.313329Z INFO mistralrs_core::device_map: Layer 6: cuda[1]
2024-08-27T05:53:40.313337Z INFO mistralrs_core::device_map: Layer 7: cuda[1]
2024-08-27T05:53:40.313343Z INFO mistralrs_core::device_map: Layer 8: cuda[1]
2024-08-27T05:53:40.313349Z INFO mistralrs_core::device_map: Layer 9: cuda[1]
2024-08-27T05:53:40.313355Z INFO mistralrs_core::device_map: Layer 10: cuda[1]
2024-08-27T05:53:40.313361Z INFO mistralrs_core::device_map: Layer 11: cuda[1]
2024-08-27T05:53:40.313367Z INFO mistralrs_core::device_map: Layer 12: cuda[1]
2024-08-27T05:53:40.313373Z INFO mistralrs_core::device_map: Layer 13: cuda[1]
2024-08-27T05:53:40.313379Z INFO mistralrs_core::device_map: Layer 14: cuda[1]
2024-08-27T05:53:40.313385Z INFO mistralrs_core::device_map: Layer 15: cuda[1]
2024-08-27T05:53:40.313391Z INFO mistralrs_core::device_map: Layer 16: cuda[2]
2024-08-27T05:53:40.313396Z INFO mistralrs_core::device_map: Layer 17: cuda[2]
2024-08-27T05:53:40.313402Z INFO mistralrs_core::device_map: Layer 18: cuda[2]
2024-08-27T05:53:40.313408Z INFO mistralrs_core::device_map: Layer 19: cuda[2]
2024-08-27T05:53:40.313414Z INFO mistralrs_core::device_map: Layer 20: cuda[2]
2024-08-27T05:53:40.313420Z INFO mistralrs_core::device_map: Layer 21: cuda[2]
2024-08-27T05:53:40.313426Z INFO mistralrs_core::device_map: Layer 22: cuda[2]
2024-08-27T05:53:40.313432Z INFO mistralrs_core::device_map: Layer 23: cuda[2]
2024-08-27T05:53:40.313438Z INFO mistralrs_core::device_map: Layer 24: cuda[2]
2024-08-27T05:53:40.313443Z INFO mistralrs_core::device_map: Layer 25: cuda[2]
2024-08-27T05:53:40.313449Z INFO mistralrs_core::device_map: Layer 26: cuda[2]
2024-08-27T05:53:40.313455Z INFO mistralrs_core::device_map: Layer 27: cuda[2]
2024-08-27T05:53:40.313461Z INFO mistralrs_core::device_map: Layer 28: cuda[2]
2024-08-27T05:53:40.313467Z INFO mistralrs_core::device_map: Layer 29: cuda[2]
2024-08-27T05:53:40.313474Z INFO mistralrs_core::device_map: Layer 30: cuda[2]
2024-08-27T05:53:40.313480Z INFO mistralrs_core::device_map: Layer 31: cuda[2]
2024-08-27T05:53:40.313486Z INFO mistralrs_core::device_map: Layer 32: cuda[3]
2024-08-27T05:53:40.313491Z INFO mistralrs_core::device_map: Layer 33: cuda[3]
2024-08-27T05:53:40.313497Z INFO mistralrs_core::device_map: Layer 34: cuda[3]
2024-08-27T05:53:40.313503Z INFO mistralrs_core::device_map: Layer 35: cuda[3]
2024-08-27T05:53:40.313509Z INFO mistralrs_core::device_map: Layer 36: cuda[3]
2024-08-27T05:53:40.313516Z INFO mistralrs_core::device_map: Layer 37: cuda[3]
2024-08-27T05:53:40.313522Z INFO mistralrs_core::device_map: Layer 38: cuda[3]
2024-08-27T05:53:40.313528Z INFO mistralrs_core::device_map: Layer 39: cuda[3]
2024-08-27T05:53:40.313533Z INFO mistralrs_core::device_map: Layer 40: cuda[3]
2024-08-27T05:53:40.313539Z INFO mistralrs_core::device_map: Layer 41: cuda[3]
2024-08-27T05:53:40.372062Z INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 6.1
2024-08-27T05:53:40.372073Z INFO mistralrs_core::utils::normal: Skipping BF16 because CC < 8.0
2024-08-27T05:53:40.377781Z INFO mistralrs_core::utils::normal: DType selected is F16.
2024-08-27T05:53:45.119030Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization into Q4K to 295 tensors.
2024-08-27T05:53:45.124278Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 80 threads.
2024-08-27T05:54:33.977956Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization into Q4K to 295 tensors out of 295 total tensors. Took 48.86s
2024-08-27T05:54:33.978167Z INFO mistralrs_core::pipeline::isq: Applying in-situ quantization bias device mapping to 294 biases.
2024-08-27T05:54:33.978461Z INFO mistralrs_core::pipeline::isq: Applying ISQ on 80 threads.
2024-08-27T05:54:33.980828Z INFO mistralrs_core::pipeline::isq: Applied in-situ quantization device mapping. Took 0.00s
2024-08-27T05:54:34.841869Z INFO mistralrs_core::pipeline::chat_template: bos_toks = "", eos_toks = "", "<end_of_turn>", unk_tok =
2024-08-27T05:54:34.912589Z INFO mistralrs_server: Model loaded.
2024-08-27T05:54:34.918294Z INFO mistralrs_core: Enabling GEMM reduced precision in BF16.
2024-08-27T05:54:34.927413Z INFO mistralrs_core: Enabling GEMM reduced precision in F16.
2024-08-27T05:54:34.928631Z INFO mistralrs_core::cublaslt: Initialized cuBLASlt handle
2024-08-27T05:54:34.928750Z INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), min_p: Some(0.05), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 }

hello
thread '' panicked at /home/wosai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called Result::unwrap() on an Err value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
0: 0x561eb8a72738 - <std::sys::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h72ae2693fe3679c0
1: 0x561eb7f3a96b - core::fmt::write::heb2112eefb3480d2
2: 0x561eb8a3512e - std::io::Write::write_fmt::hb21f507e35bcfafb
3: 0x561eb8a74629 - std::sys::backtrace::print::h8c7c13a068389915
4: 0x561eb8a73959 - std::panicking::default_hook::{{closure}}::h165d0776fc5b2025
5: 0x561eb8a750e5 - std::panicking::rust_panic_with_hook::h56be292a19683b4c
6: 0x561eb8a74a15 - std::panicking::begin_panic_handler::{{closure}}::h3deeb56cba176ab2
7: 0x561eb8a74979 - std::sys::backtrace::_rust_end_short_backtrace::hc30be025c447bdc1
8: 0x561eb8a74963 - rust_begin_unwind
9: 0x561eb7f38b51 - core::panicking::panic_fmt::h419642f979996b15
10: 0x561eb7f41185 - core::result::unwrap_failed::h0529da0e6cd0d54f
11: 0x561eb825f509 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice>::h079552940b2be28e
12: 0x561eb825ec41 - alloc::sync::Arc<T,A>::drop_slow::hc72120ce55330bbd
13: 0x561eb825eb70 - core::ptr::drop_in_place<candle_core::tensor::Tensor
>::h219980205baf0911
14: 0x561eb825c8ad - alloc::sync::Arc<T,A>::drop_slow::h85386a33c23e275d
15: 0x561eb8693334 - mistralrs_core::models::gemma2::Model::forward::h589ae9cc57b30a4d
16: 0x561eb869198d - <mistralrs_core::models::gemma2::Model as mistralrs_core::pipeline::NormalModel>::forward::hf8049a3b5a676b8c
17: 0x561eb82c42d5 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h9c3c7b255d6700fb
18: 0x561eb82f5494 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::hba58536883e3c927
19: 0x561eb8788fca - mistralrs_core::engine::Engine::run::{{closure}}::h2fc8bac59768ce2d
20: 0x561eb87840ea - std::sys::backtrace::__rust_begin_short_backtrace::h86ca94e37e66cb93
21: 0x561eb878338a - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdc48886c0e3332a8
22: 0x561eb8a772fb - std::sys::pal::unix::thread::Thread::new::thread_start::h4aa16783dfb29b35
23: 0x7f75b88f4ac3 -
24: 0x7f75b8986850 -
25: 0x0 -
thread '' panicked at /home/wosai/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called Result::unwrap() on an Err value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
0: 0x561eb8a72738 - <std::sys::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h72ae2693fe3679c0
1: 0x561eb7f3a96b - core::fmt::write::heb2112eefb3480d2
2: 0x561eb8a3512e - std::io::Write::write_fmt::hb21f507e35bcfafb
3: 0x561eb8a74629 - std::sys::backtrace::print::h8c7c13a068389915
4: 0x561eb8a73959 - std::panicking::default_hook::{{closure}}::h165d0776fc5b2025
5: 0x561eb8a750e5 - std::panicking::rust_panic_with_hook::h56be292a19683b4c
6: 0x561eb8a74a15 - std::panicking::begin_panic_handler::{{closure}}::h3deeb56cba176ab2
7: 0x561eb8a74979 - std::sys::backtrace::_rust_end_short_backtrace::hc30be025c447bdc1
8: 0x561eb8a74963 - rust_begin_unwind
9: 0x561eb7f38b51 - core::panicking::panic_fmt::h419642f979996b15
10: 0x561eb7f41185 - core::result::unwrap_failed::h0529da0e6cd0d54f
11: 0x561eb825f509 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice>::h079552940b2be28e
12: 0x561eb825ec41 - alloc::sync::Arc<T,A>::drop_slow::hc72120ce55330bbd
13: 0x561eb825eb70 - core::ptr::drop_in_place<candle_core::tensor::Tensor
>::h219980205baf0911
14: 0x561eb825c8ad - alloc::sync::Arc<T,A>::drop_slow::h85386a33c23e275d
15: 0x561eb869547c - mistralrs_core::models::gemma2::Model::forward::h589ae9cc57b30a4d
16: 0x561eb869198d - <mistralrs_core::models::gemma2::Model as mistralrs_core::pipeline::NormalModel>::forward::hf8049a3b5a676b8c
17: 0x561eb82c42d5 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h9c3c7b255d6700fb
18: 0x561eb82f5494 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::hba58536883e3c927
19: 0x561eb8788fca - mistralrs_core::engine::Engine::run::{{closure}}::h2fc8bac59768ce2d
20: 0x561eb87840ea - std::sys::backtrace::__rust_begin_short_backtrace::h86ca94e37e66cb93
21: 0x561eb878338a - core::ops::function::FnOnce::call_once{{vtable.shim}}::hdc48886c0e3332a8
22: 0x561eb8a772fb - std::sys::pal::unix::thread::Thread::new::thread_start::h4aa16783dfb29b35
23: 0x7f75b88f4ac3 -
24: 0x7f75b8986850 -
25: 0x0 -
thread '' panicked at library/core/src/panicking.rs:229:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
Aborted (core dumped)

@gqf2008 gqf2008 added the bug Something isn't working label Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant