-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gemma.cpp hangs on a Gemma 7B model that was finetuned using huggingface peft(QLoRA) #198
Comments
Hi, thanks for reaching out :) A second possible cause is that the finetune may generate weights with magnitude above 1.875, which may require a bit of extra work to support (setting the tensor's scaling factor). There is a check for this in compress_weights, but it is only enabled in 'debug' builds. The simplest way to test this is to build with msan or preferably asan enabled, if that is an option? |
Hi, Thank you for your reply. The base model is That is, it is a PT model. As far as I understand, if the prompt template is original, the PT model is often used instead of IT model. I have also created a llama.cpp(gguf) version of this model, but it works without any template problems with llama.cpp. I recompiled it using the following procedure, but no errors occurred. However, the situation is the same.
logs
Thanks! |
Thank you for sharing. So we indeed want PT and your model is trained for the prompt format used. @KumarGitesh2024 , can you help debug this? Perhaps we can insert printfs to understand which function in gemma.cc is the one that freezes? |
Hi @webbigdata-jp, Can you share logs after printing the function using |
In my localtime 21:40 - 22:00 repeat same log. |
Thanks for sharing the log. Looks like 28x Attention/FFW (one per layer). If we just end up calling Transformer without end, then it seems like |
I found that not only EOT but also Japanese nor English was being output. google original 2b-it-sfp.sbslooks good.
log
my modellooks bad.
My model outputs EOT without any problems before converting it to Gemma format. |
Interesting. It sounds like our fp8 compression might possibly be causing the trouble. |
Hello.
ls -lrth util/*sbs
I didn't see any error or warning messages during the conversion process. |
Thank you for sharing the command line. Unfortunately I see that our documentation was incorrect, sorry about that :( We had added a GEMMA_ prefix to the C++ macro name, and CMake has a different name for this: Meanwhile, this weight typedef has been troublesome enough that we are now looking into compiling for all weight types, so that no more typedef is required. |
PiperOrigin-RevId: 641161403
GEMMA_WEIGHT_T is indeed the correct flag for the C++ compiler, but the readme references CMake, and there the correct flag name is WEIGHT_TYPE. PiperOrigin-RevId: 641170380
I recompiled it but the situation did not change. 133 } else { Token ID: [0] Only zeros are output in both 16-bit and 8-bit settings. |
@webbigdata-jp thank you for trying. This sounds like a serious bug. We do not run compress_weights often and it recently changed, so this is possible. We are very busy this week but I have made a note to investigate, thanks for letting us know :) |
Thank you. I'm not in a hurry, so if you find out the cause, please let me know. If I can get gemma.cpp to work, it will open the door to the possibility of running my gemma-based models on multiple platforms without compiling. |
Hi, thanks for the interesting project!
I create Gemma 7B based model webbigdata/C3TR-Adapter.
This model is Huggingface transformer format and translation-only model with original prompt templates fine-tuned by QLoRA.
So, I convert this to pytorch(.ckpt), and result is f32_merge_model.ckpt
I have confirmed that f32_merge_model.ckpt works.
Then, run this command, no error message.
python3 convert_weights.py --tokenizer tokenizer.model --weight f32_merge_model.ckpt --output_file gemma_cpp_merge.bin --model_type 7b
Then, run this command, no error message.
./build/compress_weights --weights util/gemma_cpp_merge.bin --model 7b-pt --compressed_weights util/gemma_cpp_merge.sbs
Then, run gemma.cpp, no error message.
./build/gemma --tokenizer util/tokenizer.model --compressed_weights util/gemma_cpp_merge.sbs --model 7b-pt
and input my prompt
[### Instruction:\nTranslate English to Japanese.\n\n### Input:\nThis is a test input.\n\n### Response:\n]
but model can't output anything.
Is there something wrong with the procedure?
The text was updated successfully, but these errors were encountered: