-
I use LLMs for storytelling and role-playing adventures, but I'm curious why I keep getting different outputs even when I set a specific seed. It only happens the first two times with the same prompt, but from the third time onwards, the output is exactly the same as expected. I've tried Koboldcpp, llama.cpp, and most recently OobaBooga, but I always get the same issue. After reading about seeds on this https://github.com/oobabooga/text-generation-webui/wiki/03-%E2%80%90-Parameters-Tab , I finally understood why. I had been using Koboldcpp, llama.cpp, and OobaBooga to run GGUF models, and also EXL2 on OobaBooga. But with Koboldcpp, I once got the same output using CLBLAST instead of CuBLAS. So, is there a way to run a quantized model and still get the same output with a set seed? Right now, I use AWQ, which works with the Transformer loader in OobaBooga, but there aren't as many models on Huggingface compared to GGUF and EXL2. I run LLMs on Kaggle using their 2xT4 GPUs. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Instead of using AWQ, I discovered that I can load the model without quantization using a Transformer loader. This actually works out better since I can choose to run the model in full precision, 8-bit, or 4-bit. So yeah, problem solved 🤍 |
Beta Was this translation helpful? Give feedback.
Instead of using AWQ, I discovered that I can load the model without quantization using a Transformer loader.
This actually works out better since I can choose to run the model in full precision, 8-bit, or 4-bit.
For 8-bit or 4-bit, you just need to add the
--load-in-4bit
or--load-in-8bit
flag.So yeah, problem solved 🤍