Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When test with QLora Finetuned Gemma2 2B on Linux, it just generate a repeated char #146

Open
chenminjun-web opened this issue Sep 21, 2024 · 4 comments

Comments

@chenminjun-web
Copy link

I fine-tune the Gemma2 2B Instruction with BitsAndBytes(int4). It works when test with the transformer.
Then I follow the guide to build the mllm and quantize the model for linux.
But when I test the finetuned model with the example demo_gemma, it always output a repeated char(Korea char).
Does any one tried this?
Or is something wrong with me?

@chenghuaWang
Copy link
Contributor

chenghuaWang commented Sep 21, 2024

The outputs from the gemma-q4_k and gemma-q4_0 models provided by the mllm team are correct. You can download the gemma model params from the repository at https://huggingface.co/mllmTeam/gemma-2b-mllm/tree/main to test whether your mllm has been compiled correctly.

Additionally, have you modified the vocabulary of Gemma when finetuning? If so, you will need to provide the correct vocabulary file to demo_gemma.

@chenghuaWang
Copy link
Contributor

Have you combined the $\Delta W$ and $W_{\text{original}}$ into a new weight matrix $W$ ? As far as I know, the mllm model does not yet implement a low-rank branch. Therefore, you will need to amalgamate the weights from the low-rank component with those of the original model to create a unified weight matrix.

IIRC, some LoRA fine-tuning frameworks offer utilities to facilitate this process. For instance, the alpaca-lora framework might have relevant functions. It's also possible that BitsAndBytes provides similar functionality. You can find more information in the alpaca-lora repository, specifically in the file export_hf_checkpoint.py, around line 39: Link to GitHub.

@chenminjun-web
Copy link
Author

Thanks for your help.
I downloaded the gemma-q4_k from https://huggingface.co/mllmTeam/gemma-2b-mllm/tree/main, and tested with the mllm build on my Untuntu 20.04, it works well.
But when I converted the gemma2 model to mllm myself, it always output a single char(like ? or .) repeatedly.
I also tested the Gemma2-2B-IT model without QLora fine-tuned, it also has the same problem.
Seems it may have some problem about the convert process, but I did not see any error log.

@chenghuaWang
Copy link
Contributor

chenghuaWang commented Sep 24, 2024

The Gemma impl in mllm is v1.1. Gemma 2 shares a similar architectural foundation with the original Gemma models. Compared to the original Gemma, Gemma2 introduces features such as Logit Soft-Capping. You need to modify the modeling_gemma.hpp file. Link to the file

cc @yirongjie pls add Gemma2 to our todo list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants