You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Is there anyone here ever use llama.cpp the llama-cli command? If yes... Is anyone know what this error is?
llamainit_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\norm.cu:212: GGML_ASSERT(ggml_is_contiguous(src0)) failed
I only use code:
./ext/llama-cli -m .\llm\falcon-mamba\falcon-mamba-7b-instruct.Q2_K.gguf -p "Hi!" -ngl 1
If I didn't use -ngl 1, it's working fine. But if my prompt also long ( not so long though, can be said short ) about 30 words, it's also give me the same error.
My GPU is NVIDIA. So far I can use llamafile with gpu, ollama too. But the main problem is I don't know why it is getting that error. @@..
Another information when using "-cnv":
system
Hi!
Hello! How are you?
I'm good! Thank you for asking. How about you? How was your day?
I'm good. I want to give you an order. can I?
Of course! Go ahead and share your order with me, and I'll be happy to help. Let me know how I can assist you!
Please write a super long article about Overall Immeasurable Prowess of Michael Jordan vs Super Strength and Power Lebron James vs Stephen Curry Unrealistic 3Pt Shoot Accuracy.
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\norm.cu:212: GGML_ASSERT(ggml_is_contiguous(src0)) failed
And it just died like that if I writing longer prompt ( This is not using ngl ).
Again, another info: Somehow I can run it when using llama-cli for avx2 build. It can handle long prompt, not error whatsoever. But it can't use GPU TT__TT... So... How? Am I needed to compile the llamacpp myself to get cuda + avx2? ( I am really new to llamacpp, so... I'm not even sure I can do that.... )
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Build use from: https://github.com/ggerganov/llama.cpp/releases/tag/b3827
When try cuda, I'm using the cu117 since this is what I use on my PC, already used for many cuda related usage.
Hi! Is there anyone here ever use llama.cpp the llama-cli command? If yes... Is anyone know what this error is?
llamainit_from_gpt_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\norm.cu:212: GGML_ASSERT(ggml_is_contiguous(src0)) failed
I only use code:
./ext/llama-cli -m .\llm\falcon-mamba\falcon-mamba-7b-instruct.Q2_K.gguf -p "Hi!" -ngl 1
If I didn't use -ngl 1, it's working fine. But if my prompt also long ( not so long though, can be said short ) about 30 words, it's also give me the same error.
My GPU is NVIDIA. So far I can use llamafile with gpu, ollama too. But the main problem is I don't know why it is getting that error. @@..
Another information when using "-cnv":
system
Hi!
Hello! How are you?
I'm good! Thank you for asking. How about you? How was your day?
I'm good. I want to give you an order. can I?
Of course! Go ahead and share your order with me, and I'll be happy to help. Let me know how I can assist you!
Please write a super long article about Overall Immeasurable Prowess of Michael Jordan vs Super Strength and Power Lebron James vs Stephen Curry Unrealistic 3Pt Shoot Accuracy.
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\norm.cu:212: GGML_ASSERT(ggml_is_contiguous(src0)) failed
And it just died like that if I writing longer prompt ( This is not using ngl ).
Again, another info: Somehow I can run it when using llama-cli for avx2 build. It can handle long prompt, not error whatsoever. But it can't use GPU TT__TT... So... How? Am I needed to compile the llamacpp myself to get cuda + avx2? ( I am really new to llamacpp, so... I'm not even sure I can do that.... )
Beta Was this translation helpful? Give feedback.
All reactions