-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] InternVL2-1B performance of lmdeploy is much worse compared to the original Hugging Face PyTorch model. #2705
Comments
Thanks for your feedback. Could you provide a sample code to reproduce the results you mentioned using lmdeploy. |
Thanks for your response. Here is my code:
Note: my transformers version is 4.45.0 (4.46 will have |
@henry16lin hi, thanks for your sample code. There are two differences with the inference between hf and lmdeploy in your script . Pls try to keep the inputs are same and then compare the inference outputs
|
yes, that are few different, but I don't think that is the key factor result in poor response...
and the response are as follow:
Thank you for your response and please help me to use InternVL2-1B with lmdeploy 🙏 |
@henry16lin hi, could you try with InternVL2-2B. Seems the smaller LLMs are not tolerant to the slightly implementation differences between transformers and lmdeploy. |
Yes, I had tried InternVL2-2B in lmdeploy but it need around 6G memory and it returns empty string to me... |
In that case, could you try with quantization to reduce runtime mem https://lmdeploy.readthedocs.io/en/latest/quantization/w4a16.html# |
Checklist
Describe the bug
InternVL2-1B performance of lmdeploy is much worse compared to the original Hugging Face PyTorch model.
Reproduction
Thanks for your great work.
My task is to do image caption and I tried OpenGVLab/InternVL2-1B from Here and used the sample code from quick start to run the model. I found the performance is clearly worse in lmdeploy (but faster).
For example:
(source image: https://content.api.news/v3/images/bin/7c169b05712f7657366268afaa47ae88 )
The image shows a group of police officers at a McDonald's restaurant, with a white car parked in front of them.
The image shows a police officer and a uniformed officer in a parking lot in front of a McDonald's restaurant. The officer in the uniform is in a handcuff, and the officer in the police uniform is in a handcuff. The McDonald's restaurant is in the background with a McDonald's and a Starbucks Coffee.
One can see the responses keep repeating similar sentence.
I tried many cases and different parameter (include repetition_penalty) but didn't work and sometime lmdeploy also response Chinese like
The CCTV is in the parking lot of a convenience store, with a Coke-Cola vending machine and a suspiciously parked car. The store's facade is a stone and brick wall, and the parking lot is a curb with a red and white警示标志。
(although I have replaced Chinese system prompt in pipe.chat_template.meta_instruction and asked it must answer in English)
I tried QwenVL2-2B, and its performance remained consistent, but the memory usage of InternVL2-1B is more suitable for the scenario I'm dealing with.
Do you have any thoughts or suggestions on this issue? I'm using
lmdeploy 0.6.2
BTW, I try a few VLM models (QwenVL, InternVL series) and I found the memory usage is higher(5g->6G) but speed is faster(10s->3s). Is this the expected phenomenon?
Thank you very much!
Environment
I'm using lmdeploy 0.6.2, torch 2.2.0 (install by pip)
Error traceback
No response
The text was updated successfully, but these errors were encountered: